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Novel DMA cloning method 

Description 

The invention refers to a novel method for cloning DNA molecules using a 
homologous recombination mechanism between at least tw/o DNA 
molecules. Further, novel reagent kits suitable for DNA cloning are provided. 

Current methods for cloning foreign DNA in bacterial cells usually comprise 
the steps of providing a suitable bacterial vector, cleaving said vector with 
a restriction enzyme and in vitro-inserting a foreign DNA fragment in said 
vector. The resulting recombinant vectors are then used to transform 
bacteria. Although such cloning methods have been used successfully for 
about 20 years they suffer from several drawbacks. These drawbacks are, 
in particular, that the in vitro steps required for inserting foreign DNA in a 
vector are often very complicated and time-consuming, if no suitable 
restriction sites are available on the foreign DNA or the vector. 

Furthermore, current methods usually rely on the presence of suitable 
restriction enzyme cleavage sites in the vector into which the foreign DNA 
fragment is placed. This imposes two limitations on the final cloning 
product. First, the foreign DNA fragment can usually only be inserted into 
the vector at the position of such a restriction site or sites. Thus, the 
cloning product is limited by the disposition of suitable restriction sites and 
cloning into regions of the vector where there is no suitable restriction site, 
is difficult and often imprecise. Second, since restriction sites are typically 
4 to 8 base pairs in length, they occur a multiple number of times as the 
size of the DNA molecules being used increases. This represents a practical 
limitation to the size of the DNA molecules that can be manipulated by most 
current cloning techniques. In particular, the larger sizes of DNA cloned into 
vectors such as cosmids, BACs, PACs and Pis are such that it is usually 
impractical to manipulate them directly by restriction enzyme based 
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techniques. Therefore, there is a need for providing a new cloning method, 
from which the drawbacks of the prior art have at least partly been 
eliminated. 

According to the present invention it was found that an efficient 
homologous recombination mechanism between two DNA molecules occurs 
at usable frequencies in a bacterial host cell which is capable of expressing 
the products of the recE and recT genes or functionally related genes such 
as the reda and redlS genes, or the phage P22 recombination system 
(Kolodner et al., MoLMicrobiol. 1 1 (1 994) 23-30; Fenton, A.C. and Poteete, 
A.R., Virology 134 (1984) 148-160; Poteete, A.R. and Fenton, A.C. 
Virology 134 (1984) 161-167). This novel method of cloning DNA 
fragments is termed "ET cloning". 

The identification and characterization of the E.coli RecE and RecT proteins 
is described Gillen et al. (J. Bacterid. 145 (1981), 521-532) and Hall et al. 
(J.Bacteriol. 1 75 (1 993), 277-287). Hall and Kolodner (Proc. Natl. Acad. Sci. 
USA 91 (1994), 3205-3209) disclose in vitro homologous pairing and 
strand exchange of linear double-stranded DNA and homologous circular 
single-stranded DNA promoted by the RecT protein. Any references to the 
use of this method for the cloning of DNA molecules in cells cannot be 
found therein. 

The recET pathway of genetic recombination in E.coli is known (Hall and 
Kolodner (1 994), supra; Gillen et al. (1 981 ), supra). This pathway requires 
the expression of two genes, recE and recT. The DNA sequence of these 
genes has been published (Hall et al., supra). The RecE protein is similar to 
bacteriophage proteins, such as h exo or X Reda (Gillen et al., 
J.Mol.Biol.113 (1977), 27-41; Little, J.Biol.Chem. 242 (1967), 679-686; 
Padding and Carter, J.Biol.Chem. 246 (1971), 2513-2518; Joseph and 
Kolodner, J.Biol.Chem. 258 (1983), 10418-10424). The RecT protein is 
similar to bacteriophage proteins, such as A S-protein or A RedfS (Hall et al. 
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(1993), supra; Muniyappa and Radding, J.Biol.Chem. 261 (1986), 7472- 
7478; Kmiec and Hollomon, J.Biol.Chem.256 (1981), 12636-12639), The 
content of the above-cited documents is incorporated herein by reference. 

Oliner et al. (Nucl. Acids Res. 21 (1993), 5192-5197) describe in vivo 
cloning of PGR products in E.coli by intermolecular homologous 
recombination between a linear PGR product and a linearized piasmid 
vector. Other previous attempts to develop new cloning methods based on 
homologous recombination in prokaryotes, too, relied on the use of 
restriction enzymes to linearise the vector (Bubeck et aL, Nucleic Acids Res. 
21 (1993), 3601-3602; Oiiner et al.. Nucleic Acids Res. 21 (1993), 5192- 
5197; Degryse, Gene 170 (1996), 45-50) or on the host-specific recA- 
dependent recombination system (Hamilton et al., J. Bacterid. 171 (1989), 
4617-4622; Yang et a!.. Nature Biotech. 15 (1997), 859-865; Dabert and 
Smith, Genetics 145 (1997), 877-889). These methods are of very limited 
applicability and are hardly used in practice. 

The novel method of cloning DNA according to the present invention does 
not require in vitro treatments with restriction enzymes or DNA ligases and 
is therefore fundamentally distinct from the standard methodologies of DNA 
cloning. The method relies on a pathway of homologous recombination in 
E.coli involving the recE and recT gene products, or the reda and redfS gene 
products, or functionally equivalent gene products. The method covalently 
combines one preferably linear and preferably extrachromosomal DNA 
fragment, the DNA fragment to be cloned, with one second preferably 
circular DNA vector molecule, either an episome or the endogenous host 
chromosome or chromosomes. It is therefore distinct from previous 
descriptions of cloning in E.coli by homologous recombination which either 
rely on the use of two linear DNA fragments or different recombination 
pathways. 
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The present invention provides a flexible way to use honnologous 
recombination to engineer large DNA molecules including an intact > 76 kb 
plasmid and the E.coli chromosome. Thus, there is practically no limitation 
of target choice either according to size or site. Therefore, any recipient 
DNA in a host cell, from high copy plasmid to the genome, is amenable to 
precise alteration. In addition to engineering large DNA molecules, the 
invention outlines new, restriction enzyme-independent approaches to DNA 
design. For example, deletions between any two chosen base pairs in a 
target episome can be made by choice of oligonucleotide homology arms. 
Similarly, chosen DNA sequences can be inserted at a chosen base pair to 
create, for example, altered protein reading frames. Concerted combinations 
of insertions and deletions, as well as point mutations, are also possible. 
The application of these strategies is particularly relevant to complex or 
difficult DNA constructions, for example, those intended for homologous 
recombinations in eukaryotic cells, e.g. mouse embryonic stem cells. 
Further, the present invention provides a simple way to position site specific 
recombination target sites exactly where desired. This will simplify 
applications of site specific recombination in other living systems, such as 
plants and mice. 

A subject matter of the present invention is a method for cloning DNA 
molecules in ceils comprising the steps: 

a) providing a host cell capable of performing homologous 
recombination, 

b) contacting in said host cell a first DNA molecule which is capable 
of being replicated in said host cell with a second DNA molecule 
comprising at least two regions of sequence homology to regions on 
the first DNA molecule, under conditions which favour homologous 
recombination between said first and second DNA molecules and 

c) selecting a host cell in which homologous recombination between 
said first and second DNA molecules has occurred. 
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In the method of the present invention the homologous recombination 
preferably occurs via the recET mechanism, i.e. the homologous 
recombination is mediated by the gene products of the recE and the recT 
genes which are preferably selected from the E.coli genes recE and recT or 
functionally related genes such as the phage X reda and redlS genes. 

The host cell suitable for the method of the present invention preferably is 
a bacterial cell, e.g. a gram-negative bacterial cell. More preferably, the host 
cell is an enterobacterial cell, such as Salmonella, Klebsiella or Escherichia. 
Most preferably the host cell is an Escherichia coli cell, it should be noted, 
however, that the cloning method of the present invention is also suitable 
for eukaryotic cells, such as fungi, plant or animal cells. 

Preferably, the host cell used for homologous recombination and 
propagation of the cloned DNA can be any cell, e.g. a bacterial strain in 
which the products of the recE and recT, or reda and redlS, genes are 
expressed. The host cell may comprise the recE and recT genes located on 
the host cell chromosome or on non-chromosomal DNA, preferably on a 
vector, e.g. a plasmid. In a preferred case, the RecE and RecT, or Reda and 
RedfS, gene products are expressed from two different regulatable 
promoters, such as the arabinose-inducible BAD promoter or the lac 
promoter or from non-regulatable promoters. Alternatively, the recE and 
recT, or reda and redU, genes are expressed on a poiycistronic mRNA from 
a single regulatable or non-regulatable promoter. Preferably the expression 
is controlled by regulatable promoters. 

Especially preferred is also an embodiment, wherein the recE or redo gene 
is expressed by a regulatable promoter. Thus, the recombinogenic potential 
of the system is only elicited when required and, at other times, possible 
undesired recombination reactions are limited. The recT or redlJ gene, on 
the other hand, is preferably overexpressed with respect to recE or reda. 
This may be accomplished by using a strong constitutive promoter, e.g. the 
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EM7 promoter and/or by using a higher copy number of recT, or red(S, 
versus recE, or reda, genes. 

For the purpose of the present invention any recE and recT genes are 
suitable insofar as they allow a homologous recombination of first and 
second DNA molecules with sufficient efficiency to give rise to 
recombination products in more than 1 in 10^ cells transfected with DNA. 
The recE and recT genes may be derived from any bacterial strain or from 
bacteriophages or may be mutants and variants thereof. Preferred are recE 
and recT genes which are derived from E.coli or from E.coli bacteriophages, 
such as the reda and redfS genes from lambdoid phages, e.g. bacteriophage 
A. 



More preferably, the recE or reda gene is selected from a nucleic acid 
molecule comprising 

(a) the nucleic acid sequence from position 1320 (ATG) to 2159 (GAC) as 
depicted in Fig.7B, 

(b) the nucleic acid sequence from position 1320 (ATG) to 1998(CGA) as 
depicted in Fig.14B, 

(c) a nucleic acid encoding the same polypeptide within the degeneracy of 
the genetic code and/or 

(d) a nucleic acid sequence which hybridizes under stringent conditions with 
the nucleic acid sequence from (a), |b) and/or (c). 

More preferably, the recT or redS gene is selected from a nucleic acid 
molecule comprising 

(a) the nucleic acid sequence from position 2155 (ATG) to 2961 (GAA) as 
depicted in Fig.7B, 

(b) the nucleic acid sequence from position 2086 (ATG) to 2868 (GCA) as 
depicted in Fig. 148, 

(c) a nucleic acid encoding the same polypeptide within the degeneracy of 
the genetic code and/or 
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(d) a nucleic acid sequence which hybridizes under stringent conditions with 
the nucleic acid sequences from (a), (b) and/or (c). 

It should be noted that the present invention also encompasses mutants and 
variants of the given sequences, e.g. naturally occurring mutants and 
variants or mutants and variants obtained by genetic engineering. Further 
it should be noted that the recE gene depicted in Fig.7B is an already 
truncated gene encoding amino acids 588-866 of the native protein. 
Mutants and variants preferably have a nucleotide sequence identity of at 
least 60%, preferably of at least 70% and more preferably of at least 80% 
of the recE and recT sequences depicted in Fig.7B and 1 3B, and of the reda 
and redfS sequences depicted in Fig.14B. 

According to the present invention hybridization under stringent conditions 
preferably is defined according to Sambrook et al. (1989), infra, and 
comprises a detectable hybridization signal after washing for 30 min in 0.1 
X SSC, 0.5% SDS at 55°C, preferably at 62°C and more preferably at 
68°C. 

In a preferred case the recE and recT genes are derived from the 
corresponding endogenous genes present in the E.coli K12 strain and its 
derivatives or from bacteriophages. In particular, strains that carry the sbcA 
mutation are suitable. Examples of such strains are JC8679 and JC 9604 
(Gillen et al. (1981), supra). Alternatively, the corresponding genes may 
also be obtained from other coliphages such as lambdoid phages or phage 
P22. 

The genotype of JC 8679 and JC 9604 is Sex (Hfr, F + , F-, or F') : F-.JC 
8679 comprises the mutations: recBC 21 , recC 22, sbcA 23, thr-1 , ara-14, 
leu B 6, DE (gpt-proA) 62, lacYI, tsx-33, gluV44 (AS), galK2 (Oc), LAM-, 
his-60, relA 1, rps L31 (strR), xyl A5, mtl-1, argE3 (Oc) and thi-1. JC 9604 
comprises the same mutations and further the mutation recA 56. 
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Further, it should be noted that the recE and recT, or reda and redS, genes 
can be isolated from a first donor source, e.g. a donor bacterial cell and 
transformed into a second receptor source, e.g. a receptor bacterial or 
eukaryotic cell in which they are expressed by recombinant DNA means. 

In one embodiment of the invention, the host cell used is a bacterial strain 
having an sbcA mutation, e.g. one of E.coli strains JC 8679 and JC 9604 
mentioned above. However, the method of the invention is not limited to 
host cells having an sbcA mutation or analogous cells. Surprisingly, it has 
been found that the cloning method of the invention also works in cells 
without SbcA mutation, whether recBC + or recBC-, e.g. also in prokaryotic 
recBC + host cells, e.g. in E.coli recBC + cells. In that case preferably those 
host cells are used in which the product of a recBC type exonuclease 
inhibitor gene is expressed. Preferably, the exonuclease inhibitor is capable 
of inhibiting the host recBC system or an equivalent thereof. A suitable 
example of such exonuclease inhibitor gene is the \ red^ gene (Murphy, 
J.Bacteriol. 173 (1991), 5808-5821) and functional equivalents thereof, 
respectively, which, for example, can be obtained from other coliphages 
such as from phage P22 (Murphy, J.Biol.Chem.269 (1 994), 22507-2251 6). 

More preferably, the exonuclease inhibitor gene is selected from a nucleic 
acid molecule comprising 

(a) the nucleic acid sequence from position 3588 (ATG) to 4002 (GTA) as 
depicted in Fig.14A, 

(b) a nucleic acid encoding the same polypeptide within the degeneracy of 
the genetic code and/or 

(0 a nucleic acid sequence which hybridizes under stringent conditions (as 
defined above) with the nucleic acid sequence from (a) and/ or (b). 

Surprisingly, it has been found that the expression of an exonuclease 
inhibitor gene in both recBC+ and recBC- strains leads to significant 
improvement of cloning efficiency. 
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The cloning method according to the present invention employs a 
homologous recombination between a first DNA molecule and a second 
DNA molecule. The first DNA molecule can be any DNA molecule that 
carries an origin of replication which is operative in the host cell, e.g. an 
E.coli replication origin. Further, the first DNA molecule is present in a form 
which is capable of being replicated in the host cell. The first DNA 
molecule, i.e. the vector, can be any extrachromosomal DNA molecule 
containing an origin of replication which is operative in said host cell, e.g. 
a plasmid including single, low, medium or high copy plasmids or other 
extrachromosomal circular DNA molecules based on cosmid, PI, BAG or 
PAC vector technology. Examples of such vectors are described, for 
example, by Sambrook et al. (Molecular Cloning, Laboratory Manual, 2nd 
Edition (1989), Cold Spring Harbor Laboratory Press) and loannou et al, 
(Nature Genet. 6 (1994), 84-89) or references cited therein. The first DNA 
molecule can also be a host cell chromosome, particularly the E.coli 
chromosome. Preferably, the first DNA molecule is a double-stranded DNA 
molecule. 

The second DNA molecule is preferably a linear DNA molecule and 
comprises at least two regions of sequence homology, preferably of 
sequence identity to regions on the first DNA molecule. These homology or 
identity regions are preferably at least 1 5 nucleotides each, more preferably 
at least 20 nucleotides and, most preferably, at least 30 nucleotides each. 
Especially good results were obtained when using sequence homology 
regions having a length of about 40 or more nucleotides, e.g. 60 or more 
nucleotides. The two sequence homology regions can be located on the 
linear DNA fragment so that one is at one end and the other is at the other 
end, however they may also be located internally. Preferably, also the 
second DNA molecule is a double-stranded DNA molecule. 

The two sequence homology regions are chosen according to the 
experimental design. There are no limitations on which regions of the first 
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DNA molecule can be chosen for the two sequence homology regions 
located on the second DNA molecule, except that the homologous 
recombination event cannot delete the origin of replication of the first DNA 
molecule. The sequence homology regions can be interrupted by non- 
identical sequence regions as long as sufficient sequence homology is 
retained for the homologous recombination reaction. By using sequence 
homology arms having non-identical sequence regions compared to the 
target site mutations such as substitutions, e.g. point mutations, insertions 
and/or deletions may be introduced into the target site by ET cloning. 

The second foreign DNA molecule which is to be cloned in the bacterial cell 
may be derived from any source. For example, the second DNA molecule 
may be synthesized by a nucleic acid amplification reaction such as a PGR 
where both of the DNA oligonucleotides used to prime the amplification 
contain in addition to sequences at the 3'-ends that serve as a primer for 
the amplification, one or the other of the two homology regions. Using 
oligonucleotides of this design, the DNA product of the amplification can be 
any DNA sequence suitable for amplification and will additionally have a 
sequence homology region at each end. 

A specific example of the generation of the second DNA molecule is the 
amplification of a gene that serves to convey a phenotypic difference to the 
bacterial host cells, in particular, antibiotic resistance. A simple variation of 
this procedure involves the use of oligonucleotides that include other 
sequences in addition to the PGR primer sequence and the sequence 
homology region. A further simple variation is the use of more than two 
amplification primers to generate the amplification product. A further simple 
variation is the use of more than one amplification reaction to generate the 
amplification product. A further variation is the use of DNA fragments 
obtained by methods other than PGR, for example, by endonuclease or 
restriction enzyme cleavage to linearize fragments from any source of DNA. 
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It Should be noted that the second DNA molecule is not necessarily a single 
species of DNA molecule, it is of course possible to use a heterogenous 
population of second DNA molecules, e.g. to generate a DNA library, such 
as a genomic or cDNA library. 

The method of the present invention may comprise the contacting of the 
first and second DNA molecules in vivo. In one embodiment of the present 
invention the second DNA fragment is transformed into a bacterial strain 
that already harbors the first vector DNA molecule. In a different 
embodiment, the second DNA molecule and the first DNA molecule are 
mixed together in vitro before co-transformation in the bacterial host cell. 
These two embodiments of the present invention are schematically depicted 
in Fig. 1 . The method of transformation can be any method known in the art 
(e.g. Sambrook et al. supra). The preferred method of transformation or co- 
transformation, however, is electroporation. 

After contacting the first and second DNA molecules under conditions 
which favour homologous recombination between first and second DNA 
molecules via the ET cloning mechanism a host cell is selected, in which 
homologous recombination between said first and second DNA molecules 
has occurred. This selection procedure can be carried out by several 
different methods. In the following three preferred selection methods are 
depicted in Fig. 2 and described in detail below. 

In a first selection method a second DNA fragment is employed which 
carries a gene for a marker placed between the two regions of sequence 
homology wherein homologous recombination is detectable by expression 
of the marker gene. The marker gene may be a gene for a phenotypic 
marker which is not expressed in the host or from the first DNA molecule. 
Upon recombination by ET cloning, the change in phenotype of the host 
strain conveyed by the stable acquisition of the second DNA fragment 
identifies the ET cloning product. 
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In a preferred case, the phenotypic marker is a gene that conveys resistance 
to an antibiotic, in particular, genes that convey resistance to kanamycin, 
ampiilicin, chloramphenicol, tetracyclin or any other substance that shows 
bacteriocidal or bacteriostatic effects on the bacterial strain employed. 

A simple variation is the use of a gene that complements a deficiency 
present within the bacterial host strain employed. For example, the host 
strain may be mutated so that it is incapable of growth without a metabolic 
supplement. In the absence of this supplement, a gene on the second DNA 
fragment can complement the mutational defect thus permitting growth. 
Only those cells which contain the episome carrying the intended DNA 
rearrangement caused by the ET cloning step will grow. 

In another example, the host strain carries a phenotypic marker gene which 
is mutated so that one of its codons is a stop codon that truncates the open 
reading frame. Expression of the full length protein from this phenotypic 
marker gene requires the introduction of a suppressor tRNA gene which, 
once expressed, recognizes the stop codon and permits translation of the 
full open reading frame. The suppressor tRNA gene is introduced by the ET 
cloning step and successful recombinants identified by selection for, or 
identification of, the expression of the phenotypic marker gene. In these 
cases, only those cells which contain the intended DNA rearrangement 
caused by the ET cloning step will grow. 

A further simple variation is the use of a reporter gene that conveys a 
readily detectable change in colony colour or morphology. In a preferred 
case, the green fluorescence protein (GFP) can be used and colonies 
carrying the ET cloning product identified by the fluorescence emissions of 
GFP. In another preferred case, the lacZ gene can be used and colonies 
carrying the ET cloning product identified by a blue colony colour when X- 
gal is added to the culture medium. 
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In a second selection method the insertion of the second DNA fragment into 
the first DNA molecule by ET cloning alters the expression of a marker 
present on the first DNA molecule. In this embodiment the first DNA 
molecule contains at least one marker gene between the two regions of 
sequence homology and homologous recombination may be detected by an 
altered expression, e.g. lack of expression of the marker gene. 

In a preferred application, the marker present on the first DNA molecule is 
a counter-selectable gene product, such as the sacB, ccdB or tetracycline- 
resistance genes. In these cases, bacterial cells that carry the first DNA 
molecule unmodified by the ET cloning step after transformation with the 
second DNA fragment, or co-transformation with the second DNA fragment 
and the first DNA molecule, are plated onto a medium so the expression of 
the counter-selectable marker conveys a toxic or bacteriostatic effect on the 
host. Only those bacterial cells which contain the first DNA molecule 
carrying the intended DNA rearrangement caused by the ET cloning step 
will grow. 

In another preferred application, the first DNA molecule carries a reporter 
gene that conveys a readily detectable change in colony colour or 
morphology. In a preferred case, the green fluorescence protein (GFP) can 
be present on the first DNA molecule and colonies carrying the first DNA 
molecule with or without the ET cloning product can be distinguished by 
differences in the fluorescence emissions of GFP. In another preferred case, 
the lacZ gene can be present on the first DNA molecule and colonies 
carrying the first DNA molecule with or without the ET cloning product 
identified by a blue or white colony colour when X-gal is added to the 
culture medium. 

In a third selection method the integration of the second DNA fragment into 
the first DNA molecule by ET cloning removes a target site for a site 
specific recombinase, termed here an RT (for recombinase target) present 
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on .he «,s, DNA ™,scu,e between ,he ,„o regions o, sequence homology 
A hon,o.ogous recon,bi„«ion even, ^.y be detected by remove, „, ,he 
target Site. ^ 



= In the absence o. the ET cloning product, the RT is available for use by the 
corresponding site specific reconnbinase. The difference bet»,een the 
presence or no, „, this RT is the basis for selection o, the ET Coning 
product, in ,he presence of ,his RT and ,he corresponding site specific 
recombrnese, the site specific reoontbinase n^ediates recontbination a, ,his 
RT and changes ,he phenotype of the host so ,ha, i, is ei,her not able to 
9row or presents a readily observable phenotype. In ,i,e absence cf this RT 
.he corresponding site specific recon,binase is no, able ,o „,edia,e 
recombination. 

In a preferred case, ,he first DNA molecule to which the second DNA 
fragment is directed, contains two RTs, one of which is adjacent to, but not 
part of. an antibiotic resistance gene. The second DNA fragment is directed 
by design, to remove this RT. Upon exposure to the corresponding site 
specfic recombinase. those first DNA molecules the, do no, carry ,he ET 
Cloning produc, will be subiec, to e site specific recombination reaction 
between the RTs that remove the antibiotic resistance gene and therefore 
the first DNA molecule fails to convey resistance to the corresponding 
ant,b,otic. Only those first DNA molecules that contain ,ne ET cloning 
produc, or have failed to be site specifically recombined for some other 
reason, will convey resistance to the antibicic. 

in another preferred case, the RT to be removed by ET cloning of the 
second DNA fragment is adiacent to a gene that complements a deficiency 
presen, wi,hin the host strain employed. In another preferred case, the RT 
to be removed by ET cloning o, the second DNA fragment is adjacen, ,o a 

reporter gene that conveys a readilv detPrtahi^ 

y a leaaiiy aetectabie change in colony colour or 

morphology. 
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In another preferred case thp rt k 

udse, ine HT to be removed bv ET c\nn\nn 

TZT ' - ^= = RT 

o ,ha, ,hs corresponding .ite speCflc reco.binase can Integrate .he 
P.son,a .a ,.s RT, ,n.o ,He RT s„e, in ..e Hos, geno™. O.Her preferred 

c as. RT. nclude those descnbad ,ro. existing exan,p,as o, site specific 
. '-~-n as we,, as natural or mutated variations thereof. 

The preferred site specific reco.hinases Include Cre, FLP, K„ or any site 

specific recombinase of the integrase class n,h. , . 

regrose Class. Other preferred site specific 
recombmases includp cit^ h^uimu 
'hclude site specific recombinases of the 
resolvase/transposase class. 

There are no limitations on the method of expression o, the si.e specific 

.rso lT' *r°^' ' ---3lon Of the 

s,.e specific racombinasa Is regulated so that expression can be Induced and 
puenched according to tha optimisation o, the ET cloning efflcianc.. ,n this 
case, the site specific recomblnase gena can be either integrated into the 
host enome or carried on an episcme. ,n another preferred case, the site 
specific recomblnase Is expressed from an episome that carries a 
conditional origin of replication so that It can be eliminated from the host 

in another preferred case, a. least two of .he above three selection methods 
re combined. A particuiarly preferred case involves a two-step use of the 

Z^T '"T °' -""^ -'-i- 

™.ho . This combined use requires, most simply, the, .he DNA fragment 
o e Cloned Includes a gene, or genes ,ha. permi.s .he identification, in .ha 
<- step, o, correc, ET Coning produce by .he aculsi.lon o, a phencyplc 
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change. In a second step, expression of the gene or genes introduced in the 
f.rst step is altered so that a second round of ET cloning products can be 
.dent,f,ed. In a preferred example, the gene employed is the tetracycline 
resistance gene and the first step ET cloning products are identified by the 
acquisition of tetracycline resistance. In the second step, loss of expression 
of the tetracycline gene is identified by loss of sensitivity to nickel chloride 
fusanc acid or any other agent that is toxic to the host cell when the 
tetracycline gene is expressed. This two-step procedure permits the 
.dentification of ET cloning products by first the integration of a gene that 
conveys a phenotypic change on the host, and second by the loss of a 
related phenotypic change, most simply by removal of some of the DNA 
sequences integrated in the first step. Thereby the genes used to identify 
ET clon-ng products can be inserted and then removed to leave ET cloning 
products that are free of these genes. 

In a further embodiment of the present invention the ET cloning may also 
be used for a recombination method comprising the steps of 

a) providing a source of RecE and RecT, or Redt7 and RedS, proteins 

b) contacting a first DNA molecule which is capable of being replicated in 
a suitable host cell with a second DNA molecule comprising at least two 
regions of sequence homology to regions on the first DNA molecule, under 
conditions which favour homologous recombination between said first and 
second DNA molecules and 

0 selecting DNA molecules in which a homologous recombination between 
said first and second DNA molecules has occurred. 

The source of RecE and RecT, or Reda and Redl^, proteins may be either 
purified or partially purified RecE and RecT, or Reda and RedlS, proteins or 
cell extracts comprising RecE and RecT, or Reda and RedU, proteins. 

The homologous recombination event in this embodiment may occur in 
vitro, e.g. when providing a cell extract containing further components 
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requTsd for homologous recombination. The hcn,o,ogous recon^bination 
event, however, may also occur in vivo, e.g. by introducing RecE and RecT 
or Reda and RediS, proteins or the extract in a host cel. (which may be 
recET positive or not, or redalS positive or not, and contacting the DIVA 
molecules in the host cell. When the recombination occurs in vitro the 
selection o, DNA molecules may be accomplished by transforming ,h. 
recombination mixture in a suitable host cell and selecting for positive 
clones as described above. When the recombination occurs in vivo the 
selection methods as described above may directly be applied. 

A further subiec, matter of the invention is the use of cells, preferably 
bacenal cells, most preferably, E.coli cells capable of expressing the recE 
and recT, or reda and redE, genes as a host cell for a cloning method 
involving homologous recombination. 

Still a further subject matter of the invention is a vector system capable of 
expressing recE and recT, or reda and redfS, genes in a host cell and its use 
for a clonmg method involving homologous recombination. Preferably the 
vector system is also capable of expressing an exonuclease inhibitor gene 
as defined above, e.g. the A re6y gene. The vector system may comprise at 
east one vector. The recE and recT, or reda and redlS, genes are preferably 
located on a single vector and more preferably under control of a 
regulatable promoter which may be the same for both genes or a single 
promoter for each gene. Especially preferred is a vector system which is 
capable of overexpressing the recT, or redlS. gene versus the recE, or reda 
gene. 



still a further subject matter of the invention is the use of a source of RecE 
and RecT, or Reda and RedlS, proteins for a cloning method involving 
homologous recombination. 
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A still further subject matter of the invention is a reagent kit for cloning 
comprising 

(a) a host cell, preferably a bacterial host cell, 

(b) means of expressing recE and recT, or reda and redS, genes in 
said host cell, e.g. comprising a vector system, and 

(c) a recipient cloning vehicle, e.g. a vector, capable of being 
replicated in said cell. 

On the one hand, the recipient cloning vehicle which corresponds to the 
first DNA molecule of the process of the invention can already be present 
in the bacterial cell. On the other hand, it can be present separated from the 
bacterial cell. 

In a further embodiment the reagent kit comprises 

(a) a source for RecE and RecT, or Reda and RedlS, proteins and 

(b) a recipient cloning vehicle capable of being propagated in a host cell and 
Ic) optionally a host cell suitable for propagating said recipient cloning 
vehicle. 

The reagent kit furthermore contains, preferably, means for expressing a 
site specific recombinase in said host cell, in particular, when the recipient 
ET cloning product contains at least one site specific recombinase target 
site. Moreover, the reagent kit can also contain DNA molecules suitable for 
use as a source of linear DNA fragments used for ET cloning, preferably by 
serving as templates for PGR generation of the linear fragment, also as 
specifically designed DNA vectors from which the linear DNA fragment is 
released by restriction enzyme cleavage, or as prepared linear fragments 
included in the kit for use as positive controls or other tasks. Moreover, the 
reagent kit can also contain nucleic acid amplification primers comprising 
a region of homology to said vector. Preferably, this region of homology is 
located at the 5'-end of the nucleic acid amplification primer. 
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The invention is further illustrated by the following Sequence listings. 
Figures and Examples. 

SEQ ID NO. 1: shows the nucleic acid sequence of the plasmid 

pBAD24-rec ET (Fig. 7). 
SEQ ID NOs 2/3: show the nucleic acid and amino acid sequences of the 

truncated recE gene (t-recE) present on pBAD24-recET 

at positions 1320-2162. 
SEQ ID NOs 4/5: show the nucleic acid and amino acid sequences of the 

recT gene present on pBAD24-recET at position 2155- 

2972. 

SEQ ID NOs 6/7: show the nucleic acid and amino acid sequences of the 
araC gene present on the complementary stand to the 
one shown of pBAD24-recET at positions 974-996. 

SEQ ID NOs 8/9: show the nucleic acid an amino acid sequences of the 
bla gene present on pBAD24-recET at positions 3493- 
4353. 

SEQ ID NO 10: shows the nucleic acid sequence of the plasmid pBAD- 
EJy (Fig. 13). 

SEQ ID No 11: shows the nucleic acid sequence of the plasmid pBAD- 
aliy (Fig. 14) as well as the coding regions for the 
genes redcr (1320-200), redii (2086-2871) and redK 
(3403-3819). 

SEQ ID NOs 12-14: show the amino acid sequences of the Redo, 

RedB and RedK proteins, respectively. The redK 
sequence is present on each of pBAD-ETk (Fig. 
13) and pBAD-aRK (Fig. 14). 

Figure 1 



A preferred method for ET cloning is shown by diagram. The linear DNA 
fragment to be cloned is synthesized by PGR using oligonucleotide primers 
that contain a left homology arm chosen to match sequences in the 
recipient episome and a sequence for priming in the PGR reaction, and a 
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right homology arm chosen to match another sequence in the recipient 
episome and a sequence for priming in the PGR reaction. The product of the 
PGR reaction, here a selectable marker gene (smi), is consequently flanked 
by the left and right homology arms and can be mixed together in vitro with 
the episome before co-transformation, or transformed into a host cell 
harboring the target episome. The host cell contains the products of the 
recE and recT genes. ET cloning products are identified by the combination 
of two selectable markers, smi and sm2 on the recipient episome. 

Figure 2 

Three ways to identify ET cloning products are depicted. The first, (on the 
left of the figure), shows the acquisition, by ET cloning, of a gene that 
conveys a phenotypic difference to the host, here a selectable marker gene 
(sm). The second (in the centre of the figure) shows the loss, by ET cloning, 
of a gene that conveys a phenotypic difference to the host, here a counter 
selectable marker gene (counter-sm). The third shows the loss of a target 
site (RT, shown as triangles on the circular episome) for a site specific 
recombinase (SSR), by ET cloning. In this case, the correct ET cloning 
product deletes one of the target sites required by the SSR to delete a 
selectable marker gene (sm). The failure of the SSR to delete the sm gene 
identifies the correct ET cloning product. 



Figure 3 

A simple example of ET cloning is presented. 

(a) Top panel - PGR products (left lane) synthesized from oligonucleotides 
designed as described in Fig.1 to amplify by PGR a kanamycin resistance 
gene and to be flanked by homology arms present in the recipient vector, 
were mixed in vitro with the recipient vector (2nd lane) and cotransformed 
into a recET+ E.coli host. The recipient vector carried an ampillicin 
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resistance gene, (b) Transformation of the sbcA E.coli strain JC9604 with 
either the PGR product alone (0.2 fjg) or the vector alone (0.3 //g) did not 
convey resistance to double selection with ampicillin and kanamycin 
(amp + kan), however cotransformation of both the PGR product and the 
vector produced double resistant colonies. More than 95% of these colonies 
contained the correct ET cloning product where the kanamycin gene had 
precisely integrated into the recipient vector according to the choice of 
homology arms. The two lanes on the right of (a) show Pvu II restriction 
enzyme digestion of the recipient vector before and after ET cloning, (c) As 
for b, except that six PGR products (0.2 //g each) were cotransformed with 
pSVpaZ1 1 (0.3 /yg each) into JC9604 and plated onto Amp + Kan plates or 
Amp plates. Results are plotted as Amp + Kan-resistant colonies, 
representing recombination products, divided by Amp-resistant colonies, 
representing the plasmid transformation efficiency of the competent cell 
preparation, x 1 0^ The PGR products were equivalent to the a-b PGR 
product except that homology arm lengths were varied. Results are from 
five experiments that used the same batches of competent cells and DNAs. 
Error bars represent standard deviation, (d) Eight products flanked by 50 bp 
homology arms were cotransformed with pSVpaZl 1 into JC9604. All eight 
PGR products contained the same left homology arm and amplified neo 
gene. The right homology arms were chosen from the pSVpaZ1 1 sequence 
to be adjacent to (0), or at increasing distances (7-3100 bp), from the left. 
Results are from four experiments. 



Figure 4 

ET cloning in an approximately 1 OOkb PI vector to exchange the selectable 
marker. 

A PI clone which uses a kanamycin resistance gene as selectable marker 
and which contains at least 70kb of the mouse Hox a gene cluster was 
used. Before ET cloning, this episome conveys kanamycin resistance (top 
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panel, upper left) to its host E.coli which are ampillicin sensitive (top panel, 
upper right). A linear DNA fragment designed to replace the kanamycin 
resistance gene with an ampillicin resistance gene was made by PGR as 
outlined in Fig. 1 and transformed into E.coli host cells in which the recipient 
Hox a/Pi vector was resident. ET cloning resulted in the deletion of the 
kanamycin resistance gene, and restoration of kanamycin sensitivity (top 
panel, lower left) and the acquisition of ampillicin resistance (top panel, 
lower right) . Precise DNA recombination was verified by restriction digestion 
and Southern blotting analyses of isolated DNA before and after ET cloning 
(lower panel). 

Figure 5 

ET cloning to remove a counter selectable marker 

A PGR fragment (upper panel, left, third lane) made as outlined in Figs.1 
and 2 to contain the kanamycin resistance gene was directed by its chosen 
homology arms to delete the counter selectable ccdB gene present in the 
vector, pZero-2.1. The PGR product and the pZero vector were mixed in 
vitro (upper panel, left, 1 st lane) before cotransformation into a recE/recT + 
E.coli host. Transformation of pZero-2.1 alone and plating onto kanamycin 
selection medium resulted in little colony growth (lower panel, left). 
Gotransformation of pZero-2.1 and the PGR product presented ET cloning 
products (lower panel, right) which showed the intended molecular event 
as visualized by Pvu II digestion (upper panel, right). 

Figure 6 

ET cloning mediated by inducible expression of recE and recT from an 
episome. 

RecE/RecT mediate homologous recombination between linear and circular 
DNA molecules, (a) The plasmid pBAD24-recET was transformed into E.coli 
JG5547, and then batches of competent cells were prepared after induction 
of RecE/RecT expression by addition of L-arabinose for the times indicated 
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before harvesting. A PCR product, made using oligonucleotides e and f to 
contain the chloramphenicol resistance gene (cm) of pMAK705 and 50 bp 
homology arms chosen to flank the ampicililin resistance gene (bla) of 
pBAD24-recET, was then transformed and recombinants identified on 
chloramphenicol plates, (b) Arabinose was added to cultures of pBAD24- 
recETtransformedJC5547fordifferenttimes immediately before harvesting 
for competent cell preparation. Total protein expression was analyzed by 
SDS-PAGE and Coomassie blue staining, (c) The number of chloramphenicol 
resistant colonies pery^g of PCR product was normalized against a control 
for transformation efficiency, determined by including 5 pg pZero2.1, 
conveying kanamycin resistance, in the transformation and plating an 
aliquot onto Kan plates. 

Figure 7A 

The plasmid pBAD24-recET is shown by diagram. The plasmid contains the 
genes recE (in a truncated form) and recT under control of the inducible 
BAD promoter (Pbad)- The plasmid further contains an ampillicin resistance 
gene (Amp') and an araC gene. 

Figure 78 

The nucleic acid sequence and the protein coding portions of pBAD24-recET 
are depicted. 



Figure 8 

Manipulation of a large E.coli episome by multiple recombination steps, a 
Scheme of the recombination reactions. A PI clone of the Mouse Hoxa 
complex, resident in JC9604, was modified by recombination with PCR 
products that contained the neo gene and two Flp recombination targets 
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IFRTs). The two PCR products were ider^tical except that one was flanked 
by g and h homology arms (insertion), and the other was flanked by i and 
h homology arms (deletion,. In a second step, the neo gene was removed 
by Flp recombination between the FRTs by transient transformation of a Flp 
expression plasmid based on the pSCIOI temperature-sensitive origin (ts 
or.), b Upper panel; ethidium bromide stained agarose gel showing EcoRI 
d-gestions of PI DNA preparations from three independent colonies for each 
step. Middle panel; a Southern blot of the upper panel hybridized with a neo 
gene probe. Lower panel; a Southern blot of the upper panel hybridized with 
a Hoxa3 probe to visualize the site of recombination. Lanes 1 , the original 
Hoxa3 PI clone grown in E.coli strain NS3M5. Lanes 2. replacement of the 
Tn903 kanamycin resistance gene resident in the PI vector with an 
ampicllin resistance gene increased the 8.1 kb band (lanes 1), to 9 0 kb 
Lar,es 3, insertion of the Tn5-neo gene with g-h homology arms upstream 
of Hoxaa, increased the 6.7 kb band (lanes 1,2) to 9.0 kb. Lanes 4 Flp 
recombinase deleted the g-h neo gene reducing the 9.0 kb band (lanes 3, 
back to 6.7 kb. Lanes 5, deletion of 6 kb of Hoxa3 - 4 intergenic DNA by 
replacement with the i-h neo gene, decreased the 6.7 kb band (lanes 2) to 
4.5 kb. Lanes 6, Flp recombinase deleted the i-h neo gene reducing the 4 5 
Kb band to 2.3 kb. 



Figure 9 



Man,pulation of the E.coli chromosome. A Scheme of the recombination 
reactions. The endogenous lacZ gene of JC9604 at 7.8' of the E coli 
chromosome, shown in expanded form with relevant Ava I sites and 
coord.nates, was targeted by a PCR fragment that contained the neo gene 
flanked by homology arms] and k, and loxP sites, as depicted. Integration 
of the neo gene removed most of the lacZ gene including an Ava I site to 
alter the 1443 and 3027 bp bands into a 3277 bp band. In a second step 
the neo gene was removed by Cre recombination between the loxPs by 
transient transformation of a Cre expression plasmid based on the pSClOl 
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temperature-sensitive origin (ts ori). Removal of the neo gene by Cre 
recombinase reduces the 3277 band to 2111 bp. b S-galactosidase 
expression evaluated by streaking colonies on X-Gal plates. The top row of 
three streaks show R-gaiactosidase expression in the host JC9604 strain 
(w.t.), the lower three rows (Km) show 24 independent primary colonies, 
20 of which display a loss of S-galactosidase expression indicactive of the 
intended recombination event, c Southern analysis of E.coli chromosomal 
DNA digested with Ava I using a random primed probe made from the entire 
lacZ coding region; lanes 1,2, w.t.; lanes 3-6, four independent white 
colonies after integration of the j-k neo gene; lanes 7-10; the same four 
colonies after transient transformation with the Cre expression plasmid. 

Figure 10 

Two rounds of ET cloning to introduce a point mutation, a Scheme of the 
recombination reactions. The lacZ gene of pSVpaXI was disrupted in 
JC9604lacZ, a strain made by the experiment of Fig. 9 to ablate endogenous 
lacZ expression and remove competitive sequences, by a sacB-neo gene 
cassette, synthesized by PGR to plB279 and flanked by I and m homology 
arms. The recombinants, termed pSV-sacB-neo, were selected on 
Amp -h Kan plates. The lacZ gene of pSV-sacB-neo was then repaired by a 
PGR fragment made from the intact lacZ gene using f and m* homology 
arms. The m* homology arm included a silent G to G change that created 
a BamHI site. The recombinants, termed pSVpaXI', were identified by 
counter selection against the sacB gene using 7% sucrose, b fJ- 
galactosidase expression from pSVpaXI was disrupted in pSV-sacB-neo and 
restored in pSVpaXT. Expression was analyzed on X-gal plates. Three 
independent colonies of each pSV-sacB-neo and pSVpaXT are shown, c 
Ethidium bromide stained agarose gels of BamHI digested DNA prepared 
from independent colonies taken after counter selection with sucrose. Ail 
fi-galactosidase expressing colonies (blue) contained the introduced BamHI 
restriction site (upper panel). All white colonies displayed large 
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rearrangements and no product carried the diagnostic 1.5.5 Bar.H1 
restriction fragment (lower panel). 

Figure 11 

Schen,e o, ,he pl.smid, pBAD-ET,, which carries .he nnoblle ET system 
and the strategy employed ,„ target the Hoxa P, episome. pBAD-ETk is 
based on pBAD24 and inCdes ,i, the truncated recE gene (t-recE, under 
the arab,nose.inducib,e P„„ pron^oter; ,ii, the red gene under the EM7 
promote,; and ,iii, the red. gene under the Tn5 promoter. ,t was 
transformed into NS3,45, a racA E.coii strain which contained the Hoxa P, 
ap.some. Atte, arabinose induction, competent ce„s were prepared and 
trans ormed „i,h a PCR product carrying the chioramphenicoi resistance 
9ene ,cm, tian.ed by n and p homoiogy arms, n and p were chosen to 
recomb,ne with a segment o, the P, yector. b Southern biots o, P„„ „ 
■gested D.As hybridized with a probe made trom the P, yector to visualize 
the recombination target site (upper panel, and a probe made from the 
ch o henicol resistance gene .lower panel,. Lane 1 , DNA prepared from 
ceils harbonng the Hoxa P, episome before ET cloning. Uanes 2-17 DNA 
prepared from , 6 independent chloramphenicol resistant colonies. ' 

Figure 12 



comparison of ET Coning using the recE/recT genes in pBAD-ETy with 
redff/redlS genes in pBAD-aRy. 

The plasmids pBAD-ETy or pBAD-o^K, depicted, were transformed ,nto the 
E.cci, recA., recBC. strain, DK, and targeted by a chloramphenicol gene 
as descnbed in Fig.6 to eyaluate ET cloning efficiencies. Arabi!ose 
inductron of protein expression was for 1 hour. 
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Figure 13A 



are 



The plasmid pBAD-ETk is shown by diagram. 
5 Figure 13B 



0 Figure 14A 



The plasmid pBAD-aft^ is shown by diagram Thi. . : 

corresponds to the p.asmid shown in Pi Tx ^ ^"'""'"^ 

uwn m i-ig. i j except that the recF anH r^^r 
9=nes are substitute by ,be red„ and radS ge„.s. 

Figure 14B 



The nucleic acid seauenrp anH 

depicad. """"" """'-^ '■S*°-<"5K are 

1 . Methods 

1.1. Preparation of linear fragments 

Standard PGR reaction conditions were used to . r. 
fraqments Th« ° ^""P''^^ ''"ea"- DNA 

agments. The sequences of .he p^ars used a,a depicad in Tab,e 



Table 1 



The Tn5-neo gene from pJP5603 fP^nf^w ^ r. 
(1992) 145 i.«> """^ Pemberton, Gene 118 

992,, 145-146, was amplified by using oligo pairs a/b and c/d The 

cNoramphenicoMcm,resistantgenefrompMAK705(Hashimoto.aotohand 
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Sekiguchi, J.Bacteriol.131 (1977), 405-412) was amplified by using primer 
pairs e/f and n/p. The Tn5-neo gene flanked by FRT or loxP sites was 
amplified from pKaZ or pKaX (http://www.embl-heidelberg.de/Externallnfo 
/Stewart) using oligo pairs i/h, g/h and j/k. The sacB-neo cassette from 
plB279(Blomfieldetal.,Mol.Microbiol.5 (1991), 1447-1457) was amplified 
by using oligo pair l/m. The lacZ gene fragment from pSVpaZ1 1 (Buchholz 
et al.. Nucleic Acids Res.24 (1 996), 4256-4262) was amplified using oligo 
pair iVm". PGR products were purified using the QIAGEN PGR Purification 
Kit and eluted with H.O^, followed by digestion of any residual template 
DNA with Dpn I. After digestion, PGR products were extracted once with 
Phenol:CHCl3, ethanol precipitated and resuspended in H^O at approximately 



1.2 Preparation of competent cells and electroporati 



ion 



Saturated overnight cultures were diluted 50 fold into LB medium, grown 
to an OD600 of 0.5, following by chilling on ice for 15 min. Bacterial cells 
were centrifuged at 7,000 rpm for 10 min at 0°G. The pellet was 
resuspended in ice-cold 10% glycerol and centrifuged again (7,000 rpm, 
-5°C, 10 min). This was repeated twice more and the cell pellet was 
suspended in an equal volume of ice-cold 10% glycerol. Aliquots of 50 //I 
were frozen in liquid nitrogen and stored at -80°C. Gells were thawed on 
ice and 1 ^1 DNA solution (containing, for co-transformation, 0.3 /yg plasmid 
and 0.2 /yg PGR products; or, for transformation, 0.2 /yg PGR products) was 
added. Electroporation was performed using ice-cold cuvettes and a Bio-Rad 
Gene Pulser set to 25 /yFD, 2.3 kV with Pulse Controller set at 200 ohms. 
LB medium ( 1 ml) was added after electroporation. The cells were incubated 
at 37 "G for 1 hour with shaking and then spread on antibiotic plates. 



1.3 Induction of RecE and RecT expression 
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E.coii JC5547 carrying pBAD24-recET was cultured overnight in LB medium 
plus 0.2% glucose, 100;;g/ml ampicillin. Five parallel LB cultures one of 
which (0) included 0.2% glucose, were started by a 1/100 inoculation The 
cultures were incubated at 37°C with shaking for 4 hours and 0 1% L- 
arabinose was added 3, 2, 1 or 1/2 hour before harvesting and processing 
as above. Immediately before harvesting, 100;;l was removed for analysis 
on a 10% SDS-polyacrylamide gel. E.coii NS3145 carrying Hoxa-PI and 
pBAD-ETk was induced by 0. 1 % L-arabinose for 90 min before harvesting 



1.4 Transient transformation of FLP and Cre expression piasmids 

The FLP and Cre expression piasmids, 705-Cre and 705-FLP (Buchhoiz et 
al, Nucleic Acids Res. 24 (1996), 3118-3119), based on the pSCIOl 
temperature sensitive origin, were transformed into rubidium chloride 
competent bacterial cells. Cells were spread on 25 ;;g/ml chloramphenicol 
Plates, and grown for 2 days at 30°C, whereupon colonies were picked 
replated on L-agar plates without any antibiotics and incubated at 40°C 
overnight. Single colonies were analyzed on various antibiotic plates and all 
Showed the expected loss of chloramphenicol and kanamycin resistance. 

1.5 Sucrose counter selection of sacB expression 

The E.coii JC9604lacZ strain, generated as described in Fig. 11 was 
cotransformed with a sacB-neo PCR fragment and pSVpaXI (Buchhoiz et 
al. Nucleic Acids Res. 24 (1 996), 4256-4262). After selection on 1 00;.g/ml 
ampicllin, 50 /.g/ml kanamycin plates, pSVpaX-sacB-neo piasmids were 
isolated and cotransformed into fresh JC9604lacZ cells with a PCR 
fragment amplified from pSVpaXI using primers iVm'. Oligo m' carried a 
Silent point mutation which generated a BamHI site. Cells were plated on 
7% sucrose. 100^g/ml ampicillin, 40/.g/ml X-gal plates and incubated at 
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28-C fc. 2 days. The blue and wh„e colonies grown on sucrose platas 
were counted and further checked by restriction analysis. 



1.6 Other methods 



DMA preparation and Southern analysis were performed according to 
standard procedures. Hybridization probes were generated by random 
pnm,ng of fragments isolated from the Tn5 neo gene ,Pvu,„, Hoxa3 gene 
both H . II, fragments,, lacZ genes ,Eco«, and BamH, fragments Lm 
pSVpaX,,, cm gene (BstB, fragments from pMAK705, and PI yecor 
fragments (2.2 kb EcoRl fragments from P, vector,. 



2. Results 



2.1 Identification of recombination events in E.coli 

To identify a flex.ble homologous recombination reaction in E.coli, an assay 
based on recombrnation between linear and circular DMAs was designed 
(F.g. 1 , F,g. 3,. Linear DMA carrying the Tn5 kanamycin resistance gene (neo, 
was made by PCR ,Fig.3a,. initially, ,he oligonucleotides used for PCR 
amplrfication of neo were eOmers consisting of 42 nucleotides a, their 5' 
ends identical to chosen regions in the plasmid and. a, the 3' ends ,8 
nucleotides to serve as PCR primers. Linear and circular DNAs were mixed 
.n e,u,mcla, proportions and co-transformed into a variety of E.coli hosts 
Homo ogous recombination was only detected in sbcA E.coli hosts. More 
than 96,4 of double ampicillin/kanamycin resistant colonies ,Fi, 3b, 
contarned the expected homologously recombined plasmid as determined 
y restriction digestion and sequencing. Only a low background of 
kanamycin resistance, due to genomic integration of the neo gene was 
apparent (not shown). 
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The l,near plus circular recon,bir,a,ion reacior, was characterized ,r, ,wo 
ways. The relatlcnshlp betweeen homology arn, length and recombination 
a„c,ency was simple, with longer arms recomblning more afticiently 
(F g.3cl. Effrciency increased within the range tested, up to 60 bp The 
e fee, o. distance between the two chosen homology sites in the recipient 
Plasnn,d was examined ,Fig.3d,. A set of eight PCR fragments was 
generated by use o, a constant left homology arm with differing right 
homology arms. The right homology arms were chosen from the plasmid 
seguence ,o be 0 . 3,00 bp from the left. Correct products were readily 
Obtained from all, with less than 4 fold difference between them, although 
he .nsertiona, product ,0, was leas, efficient. Correct products also 
depended on the presence of both homology arms, since PCR fragments 
containing only one arm failed to work, 

2.2 Involvement of RecE and RecT 

Therelationship between hostgenotypeandthis homologous recombina,ion 
reaction was more systematically examined using a pane, of E.coli strains 
dahcent ,o various recombination components (Table 2). 

Table 2 



Only the ,„o sbcA strains, JC8679 and JC9604 presented tha intended 
recombination products and RecA was no. required. In sbcA strains 
expression o, RecE and RecT is activated. Dependence on recE can be 
ihferred from comparison o, JC8679 with JC869,. Notably no 
recombination products were observed in .C9387 suggesting that the 

bcBC background is no, capable o, supporting homologous recombination 
based on 50 nucleotide homology arms. 

To demonstrate that RecE and RecT are involved, part o, the recET opercn 
was cloned into an inducible expression vector to caate pBAD24.recET 



wo 99/29837 



PCT/EP98/0794S 



- 32 - 

(Fig.6a). the recE gene was truncated at its N-terminal end, as the first 588 
a.a.s of RecE are dispensable. The recBC strain, JC5547. was transformed 
with pBAD24-recET and a time course of RecE/RecT induction performed 
by adding arabinose to the culture media at various times before harvesting 
for competent cells. The batches of harvested competent cells were 
evaluated for protein expression by gel electrophoresis {Fig.6b) and for 
recombination between a linear DNA fragment and the endogenous 
pBAD24-recET plasmid (Fig.6c). Without induction of RecE/RecT no 
recombinant products were found, whereas recombination increased in 
approximate concordance with increased RecE/RecT expression. This 
experiment also shows that co-transformation of linear and circular DNAs 
.s not essential and the circular recipient can be endogenous in the host 
From the results shown in Figs.3, 6 and Table 2, we conclude that RecE 
and RecT mediate a very useful homologous recombination reaction in 
recBC E.coli at workable frequencies. Since RecE and RecT are involved, we 
refer to this way of recombining linear and circular DMA fragments as "ET 
cloning". 

2.3 Application of ET cloning to large target DNAs 

To show that large DNA episomes could be manipulated in E.coli, a > 76 
kb PI clone that contains at least 59 kb of the intact mouse Hoxa complex 
(confirmed by DNA sequencing and Southern blotting), was transferred to . 
an E.coli strain having an sbcA background (JC9604) and subjected to two 
rounds of ET cloning. In the first round, the Tn903 kanamycin resistance 
gene resident in the PI vector was replaced by an ampicillin resistance gene 
(F.g.4). In the second round, the interval between the Hoxa3 and a4 genes 
was targeted either by inserting the neo gene between two base pairs 
upstream of the Hoxa3 proximal promoter, or by deleting 6203 bp between 
the Hoxa3 and a4 genes (Fig.Sa). Both insertional and deletional ET cloning 
products were readily obtained (Fig.Sb, lanes 2, 3 and 5) showing that the 
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two rounds of ET cloning took place in this large E.coli episome with 
precision and no apparent unintended recombination. 

The general applicability of ET cloning was further examined by targeting 
a gene in the E.coli chromosome (Fig.Sa). The (5-galactosidase (lacZ) gene 
of JC9604 was chosen so that the ratio between correct and incorrect 
recombinants could be determined by evaluating S-galactosidase 
expression. Standard conditions (0.2 ,;g PCR fragment; 50 competent 
cells), produced 24 primary colonies, 20 of which were correct as 
determined by IS-galactosidase expression {Fig.9b), and DNA analysis 
{Fig.9c, lanes 3-6). 

2.4 Secondary recombination reactions to remove operational sequences 

The products of ET cloning as described above are limited by the necessary 
inclusion of selectable marker genes. Two different ways to use a further 
recombination step to remove this limitation were developed. In the first 
way, site specific recombination mediated by either Flp or Cre recombinase 
was employed. In the experiments of Figs.8 and 9, either Flp recombination 
target sites (FRTs) or Cre recombination target sites (loxPs) were included 
to flank the neo gene in the linear substrates. Recombination between the 
FRTs or loxPs was accomplished by Flp or Cre, respectively, expressed from 
Piasmids with the pSCIOI temperature sensitive replication origin 
(Hash.moto-Gotoh and Sekiguchi, J.Bacteriol. 131 (1977), 405-412) to 
permit simple elimination of these piasmids after site specific recombination 
by temperature shift. The precisely recombined Hoxa PI vector was 
recovered after both ET and Flp recombination with no other recombination 
products apparent (Fig.8, lanes 4 and 6). Similarly, Cre recombinase 
precisely recombined the targeted lacZ allele (Fig.9. lanes 7-10). Thus site 
specific recombination can be readily coupled with ET cloning to remove 
operational sequences and leave a 34 bp site specific recombination target 
site at the point of DNA manipulation. 
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In the second way to remove the selectable marker gene, two rounds of ET 
cloning, combining positive and counter selection steps, were used to leave 
the DNA product free of any operational sequences (Fig. 10a). 

Additionally this experiment was designed to evaluate, by a functional test 
based on 13-galactosidase activity, whether ET cloning promoted small 
mutations such as frame shift or point mutations within the region being 
manipulated. In the first round, the lacZ gene of pSVpaXI was disrupted 
with a 3.3 kb PCR fragment carrying the neo and B.subtilis sacB (Blomfield 
et al., Mol.Microbiol. 5 (1991), 1447-1457) genes, by selection for 
kanamycin resistance (Fig. 10a). As shown above for other positively 
selected recombination products, virtually all selected colonies were white 
(Fig. 10b), indicative of successful lacZ disruption, and 17 of 17 were 
confirmed as correct recombinants by DNA analysis. In the second round, 
a 1 .5 kb PCR fragment designed to repair lacZ was introduced by counter 
selection against the sacB gene. Repair of lacZ included a silent point 
mutation to create a BamHl restriction site. Approximately one quarter of 
sucrose resistant colonies expressed S-galactosidase, and all analyzed (17 
of 17; Fig. 10c) carried the repaired lacZ gene with the BamHl point 
mutation. The remaining three quarters of sucrose resistant colonies did not 
express B-galactosidase, and all analyzed (17 of 17; Fig. 10c) had 
undergone a variety of large mutational events, none of which resembled 
the ET cloning product. Thus, in two rounds of ET cloning directed at the 
lacZ gene, no disturbances of B-galactosidase activity by small mutations 
were observed, indicating the RecE/RecT recombination works with high 
fidelity. The significant presence of incorrect products observed in the 
counter selection step is an inherent limitation of the use of counter 
selection, since any mutation that ablates expression of the counter 
selection gene will be selected. Notably, all incorrect products were large 
mutations and therefore easily distinguished from the correct ET product by 
DNA analysis. In a different experiment (Fig. 5), we observed that ET cloning 
into pZero2. 1 (InVitroGen) by counter selection against the ccdB gene gave 
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a lower background of incorrect products (8%), indicating that the counter 
selection background is variable according to parameters that differ from 
those that influence ET cloning efficiencies. 

2.5 Transference of ET cloning between E.coli hosts 

The experiments shown above were performed in recBC- E.coli hosts since 
the sbcA mutation had been identified as a suppressor of recBC (Barbour 
et al., Proc.Natl.Acad.Sci. USA 67 (1970), 128-135; Clark, Genetics 78 
(1974), 259-271). However, many useful E.coli strains are recBC + , 
including strains commonly used for propagation of PI, BAC or PAC 
episomes. To transfer ET cloning into recBC + strains, we developed pBAD- 
ETk and pBAD-ady (Figs. 13 and 14). These plasmids incorporate three 
features important to the mobility of ET cloning. First, RecBC is the major 
E.coli exonuclease and degrades introduced linear fragments. Therefore the 
RecBC inhibitor, RedK (Murphy, J. Bacterid. 173 (1991), 5808-5821), was 
included. Second, the recombinogenic potential of RecE/RecT, or 
Reda/RedlS, was regulated by placing recE or redo under an inducible 
promoter. Consequently ET cloning can be induced when required and 
undesired recombination events which are restricted at other times. Third, 
we observed that ET cloning efficiencies are enhanced when RecT, or RedB, 
but not RecE, or Redo, is overexpressed. Therefore we placed recT, or redfS, 
under the strong, constitutive, EM7 promoter. 

pBAD-ET/ was transformed into NS3145 E.coli harboring the original Hoxa 
PI episome (Fig. 1 la). A region in the PI vector backbone was targeted by 
PCR amplification of the chloramphenicol resistance gene (cm) flanked by 
n and p homology arms. As described above for positively selected ET 
cloning reactions, most (> 90%) chloramphenicol resistant colonies were 
correct. Notably, the overall efficiency of ET cloning, in terms of linear DNA 
transformed, was nearly three times better using pBAD-ET^ than with 
similar experiments based on targeting the same episome in the sbcA host. 
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10 



JC9604. This is consistent with our observation that overexpression of 
RecT innproves ET cloning efficiencies. 

A comparison between ET cloning efficiencies mediated by RecE/RecT, 
expressed from pBAD-ET^, and Redflr/RedB. expressed from pBAD-aRK was 
made in the recA-, recBC+ E.coli strain, DK1 (Fig. 12). After transformation 
of E.coli DK1 with either pBAD-ETk or pBAD-alSK, the same experiment as 
described in Figure 6a,c, to replace the bla gene of the pBAD vector with 
a chloramphenicol gene was performed. Both pBAD-ETK or pBAD-aftK 
presented similar ET cloning efficiencies in terms of responsiveness to 
arabinose induction of RecE and Reda, and number of targeted events. 
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Table 2 



E.coli 








Strains 


Genotypes 


Amp+Kan 


Amp 








X loVc 


JC8679 


recBC sbcA 


318 


2.30 


JC9604 


recA recBC sbcA 


114 


0.30 


JC8691 


recBC sbcA recE 


0 


0.37 


JC5547 


recA recBC 


0 


0.37 


JC5519 


recBC 


0 


1.80 


JC15329 


recA recBC sbcBC 


0 


0.03 


JC9387 


recBCsbcBC 


0 


2.20 


JC8111 


recBC sbcBC recF 


0 


2.40 


JC9366 


recA 


0 


0.37 


JC13031 


recJ 


0 


0.45 
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Claims 



A method for cloning DNA molecules in cells comprising the steps of: 

a) providing a host cell capable of performing homologous 
recombination, 

b) contacting in said host cell a first DNA molecule which is 
capable of being replicated in said host cell with a second 
DNA molecule comprising at least two regions of sequence 
homology to regions on the first DNA molecule, under 
conditions which favour homologous recombination between 
said first and second DNA molecules and 

0 selecting a host cell in which homologous recombination 
between said first and second DNA molecules has occurred. 

The method according to claim 1 wherein the homologous 
recombination occurs via the recET cloning mechanism. 

The method according to claim 2 wherein the host cell is capable of 
expressing recE and recT genes. 

The method according to claim 3 wherein the recE and recT genes 
are selected from E.coli recE and recT genes or from A reda and redl3 
genes. 

The method according to claim 3 or 4 wherein the host cell is 
transformed with at least one vector capable of expressing recE 

and/or recT genes* 

The method of claim 3, 4 or 5 wherein the expression of the recE 
and/or recT genes is under control of a regulatable promoter. 
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7. The method of olaim 5 or 6 wherein ,he reoT gene is overexpre.sed 

versus the recE gene. 

8. The method according to any one of claims 3 to 7 wherein the recE 

gene .s selected from a nucleic acid molecule comprising 

(a) the nucleic acid sequence from position 1320 (ATG) to 2159 
(GAC) as depicted in Fig.7B, 

lb) the nucleic acid sequence from position 1320 (ATG) to 1998- 

(CGA) as depicted in Fig.l3B, 

(0 a nucleic acid encoding the same polypeptide within the 
degeneracy of the genetic code and/or 

Id) a nucleic acid sequence which hybridizes under stringent 
conditions with the nucleic acid sequence from (a), (b) and/or (c). 

9. The method according to any one of claims 3 to 8 wherein the recT 
gene is selected from a nucleic acid molecule comprising 
la) the nucleic acid sequence from position 2155 (ATG) to 2961 
(GAA) as depicted in Fig.7B, 

(b) the nucleic acid sequence from position 2086 (ATG) to 2868 
(GCA) as depicted in Fig. 1 38, 

fc) e nucleic acid encoding ,he same polypeptide within the 
degeneracy of the genetic code and/or 

(d) a nucleic acid sec,uence which hybridize, under atringen, 
conditiona with the nucleic acid sequences from ,a,, ,b, and/or ,c). 

'0. • ^^=™*od according to any one of the previous Claims Wherein the 
host cell is a gram-negative bacterial cell. 

11. The method according to claim 10 wherein the host cell is an 

Escherichia coli cell. 
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12. The method according to claim 11 wherein the host cell is an 
Escherichia coli K12 strain. 



13. 



16. 



The method according to claim 12 wherein the E.coli strain is 
selected from JC 8679 and JC 9604. 



1 4. The method according to any one of the previous claims wherein the 
host cell further is capable of expressing a recBC inhibitor gene. 

15. The method according to claim 14 wherein the host cell is 
transformed with a vector expressing the recBC inhibitor gene. 



The method according to claim 14 or 1 5 wherein the recBC inhibitor 
gene is selected from a nucleic acid molecule comprising 

(a) the nucleic acid sequence from position 3588 (ATG) to 4002 
(GTA) as depicted in Fig.lSB, 

(b) a nucleic acid encoding the same polypeptide within the 
degeneracy of the genetic code and/or 

(c) a nucleic acid sequence which hybridizes under stringent 
conditions (as defined above) with the nucleic acid sequence from (a) 
and/ or (b). 



17. The method according to any one of claims 13 to 16 wherein the 
host cell is a prokaryotic recBC+ cell. 

18. The method according to any one of the previous claims wherein the 
first DNA molecule is circular. 

1 9. The method according to any one of the previous claims wherein the 
first DNA molecule is an extrachromosomal DNA molecule containing 
an origin of replication which is operative in the host cell. 
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20. The method according to claim 18 or 19 wherein the first DMA 
molecule is selected from piasmids, cosmids, Pi vectors, BAC 
vectors and PAC vectors. 

21. The method according to any one of claims 1-18 wherein the first 
DNA molecule is a host cell chromosome. 

22. The method according to any one of the previous claims wherein the 
second DNA molecule is linear. 

23. The method according to any one of the previous claims wherein the 
regions of sequence homology are at least 15 nucleotides each. 

24. The method according to one of claims 1 to 1 6 wherein the second 
DNA molecule is obtained by an amplification reaction. 

25. The method according to one of the previous claims wherein the first 
and/or second DNA molecules are introduced into the host cells by 
transformation, 

26. The method according ,o claim 26 wherein ,he transformation 

method is electroporation. 

27. The method according to one of claims , to 26 wherein the first and 
second DNA molecules are introduced into the host cell 
simultaneously by co-transformation. 

28. The method according to one of claims , to 26 wherein the second 
DNA molecule is Introduced into a host cell in which the first DNA 
molecule is already present. 
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29. The method according to one of the previous claims wherein the 
second DNA molecule contains at least one marker gene placed 
between the two regions of sequence homology and wherein 
homologous recombination is detected by expression of said marker 

5 gene. 

30. The method according to claim 29 wherein gene presence is selected 
from antibiotic resistance genes, deficiency complementation genes 
and reporter genes. 

10 

31. The method of any one of claims 1 to 30 wherein the first DNA 
molecule contains at least one marker gene between the two regions 
of sequence homology and wherein homologous recombination is 
detected by lack of expression of said marker gene. 

15 

32. The method of any one of claims 1 to 31 wherein said marker gene 
is selected from genes which, under selected conditions, convey a 
toxic or bacteriostatic effect on the cell, and reporter genes. 

20 33. A method according to any one of the previous claims wherein the 
first DNA molecule contains at least one target site for a site specific 
recombinase between the two regions of sequence homology and 
wherein homologous recombination is detected by removal of said 
target site. 

25 

34. A method for cloning DNA molecules comprising the steps of: 

(a) providing a source of RecE and RecT proteins, 

(b) contacting a first DNA molecule which is capable of being 
replicated in a suitable host cell with a second DNA molecule 

^0 comprising at least two regions of sequence homology to regions on 

the first DNA molecule, under conditions which favour homologous 
recombination between said first and second DMA molecules and 
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(c) selecting DNA molecules In which homologous recombination 
between said first and second DNA molecules has occurred. 

35. The method of claim 34 wherein said RecE and RecT or proteins are 
selected from E.coli RecE and RenT proteins or from phage A Reda 
and RedR proteins. 

36. The method of claim 34 or 35 wherein the recombination occurs in 
vitro. 

37. The method of claim 34 or 35 wherein the recombination occurs in 
vivo. 

38. Use of cells capable of expressing the recE and recT genes as a host 
cell for a cloning method involving homologous recombination. 

39. Use of a vector system capable of expressing recE and recT genes 
in a host cell for a cloning method involving homologous 
recombination. 

40. Use of claims 38 or 39 wherein the recE and recT genes are selected 
from E.coli recE and recT genes or from A redff and redlS genes. 

41. Use of a source of RecE and RecT proteins for a cloning method 
involving homologous recombination. 

42. Use of claim 41 wherein said RecE and RecT or proteins are selected 
from E.coli RecE and RecT proteins or from phage A Reda and RedfS 
proteins. 
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43. A reagent kit for cloning comprising 

(a) a host cell 

(b) means of expressing recE and recT genes in said host cell and 

(c) a recipient cloning vehicle capable of being replicated in said cell. 

44. The reagent kit according to claim 43 wherein the means (b) 
comprise a vector system capable of expressing the recE and recT 
genes in the host cell. 

45. The reagent kit according to claim 43 or 44 wherein the recE and 
recT genes are selected from E.coli recE and recT genes or from X 
reda and redii genes. 

46. A reagent kit for cloning comprising 

(a) a source for RecE and RecT proteins and 

(b) a recipient cloning vehicle capable of being propagated in a host 
cell. 

47. The reagent kit according to claim 46 further comprising a host cell 
suitable for propagating said recipient cloning vehicle. 

48. The reagent kit according to claim 46 or 47 wherein said RecE and 
RecT or proteins are selected from E.coli RecE and RecT proteins or 
from phage A Reda and RedlS proteins. 

49. The reagent kit according to any one of claims 43-48 further 
comprising means for expressing a site specific recombinase in said 
host cell. 

50. The reagent kit according to any one of claims 43-49 further 
comprising nucleic acid amplification primers comprising a region of 
homology to said recipient cloning vehicle. 
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Figure 4b 
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bacterial strain (Kan resistance) 

2 of Pl-Hox clones in JC9604 before 
homologous recombination (Kan resistance) 

3 of Pl-Hox clones in JC9604 after 
homologous recombination (Amp resistance) 
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Figure 6 
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Figure 6 
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Figure 7a 
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Figure 7b 



1 ATCGATGCATAATGTGCCTGTCAAATGGACGAAGCAGGGATTC 
44 TGCAAACCCTATGCTACTCCGTCAAGCCGTCAATTGTCTXSATT 
87 CGTTACCAA TTA TGA CAA CTT GAC GGC TAG ATC 
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ACG 


241^ Thr 


Ser 


Leu 


Leu 


Leu 


Lys 


Al a 


Gl n 


Ser 


lie A rg 


285 TTG 


GTC 


CTC 


GCG 


CCA 


GCT 


TAA 


GAC 


GCT 


AAT 


CCC 


230^ Gl n 


Asp 


Gl u 


A rg 


Trp 


Ser 


Leu 


Val 


Ser 


1 1 e 


Gl y 


318 TAA 


CTG 


CTG 


GCG 


GAA 


AAG 


ATG 


TGA 


CAG 


ACG 


CGA 


219^ Leu 


Gl n 


Gl n 


A rg 


Phe 


Leu 


Hi s 


Ser 


Leu 


A rg 


Ser 


351 CGG 


CGA 


CAA 


GCA 


AAC 


ATG 


CTG 


TGC 


GAC 


GCT 


GGC 


208^ Pro 


Ser 


Leu 


Cys 


Val 


Hi s 


Gl n 


Al a 


Val 


Ser 


Al a 


EcoRV 




















384 GAT 


ATC 


AAA 


ATT 


GCT 


GTC 


TGC 


CAG 


GTG 


ATC 


GCT 


197^ 1 1 e 


Asp 


Phe 


Asn 


Ser 


Asp 


Ala 


Leu 


Hi s 


Asp 


Ser 


417 GAT 


GTA 


CTG 


ACA 


AGC 


CTC 


GCG 


TAG 


CCG 


ATT 


ATC 


186^ 1 1 e 


Tyr 


Gl n 


Cys 


A! a 


Gl u 


A rg 


Val 


A rg 


Asn 


Asp 
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Figure 7b (cont'd) 



4Rn PAT 


CGG 


TGG 


r\X\j 






CTC 


GTT 


AAT 


CtiC 


MIT 


175^Met 


Pro 


Pro 


Hi s 


Leu 


Ser 


Gl u 


Asn 


1 1 e 


Ala 


Gl u 


4po PAT 


GCG 


CCG 


CAG 






TTG 


CTC 


A Ar* 


CAG 


ATT 


164^Met 


Arg Arg 


Leu 


Leu 


Leu 


Gl n 


Gl u 


Leu 


Leu 


Asn 


R1 TAT 


CX3C 


CAG 


CAG 


pnyi 




ATA 


GCG 


L-Cv^ 


TTC 


CCC 


153< 1 1 e 


Ala 


Leu 


Leu 


Gl u 


Ser 


Tyr A rg 


Gl y 


Gl u 


Gly 




ccc 


GGC 


GTT 






TTG 


CCC 


AAA 

AAA 


CAG 


GTC 


142 ^Gl n 


Gl y Ala Asn 


1 1 e 


1 1 e 


Gl n 


Gly 


Phe 


Leu Asp 




GAA 


ATG 


CGG 




CiCr 


CGC 


TTC 


ATC 


CGG 


GCG 


131< Ser 


Phe 


Hi s 


P ro 


Gl n 


Hi s 


Al a 


Gl u 


Asp 


P ro A rg 


£r 1 C "A TV TV 

olD AAA 


GAA 


CCC 


CGT 


ATT 


GGC 


AAA 


TAT 


TGA 


CGG 


CCA 


120^ Phe 


Phe 


Gl y 


Thr 


Asn 


A! a 


Phe 


1 1 e 


Ser 


Pro 


Trp 


a A o /^i TIM 
o4o Gil 


AAG 


CCA 


TTC 


ATG 


CCA 


GTA 


GGC 


GCG 


CGG 


ACG 


109^Asn 


Leu 


Trp 


Gl u 


Hi s 


Trp 


Tyr Ala 


A ra 


Pro A rg 


^ O 1 "A TV ^ 

Dol AAA 


GTA 


AAC 


CCA 


CTG 


GTG 


ATA 


CCA 


TTC 


GCG 


AGC 


98^ Phe 


Tyr 


Val 


Trp 


Gl n 


His 


Tyr 


Trp 


Gl u 


A rg Ala 




CGG 


ATG 


ACG 


TV r^r^ 
ACC 


GTA 


GTG 


ATG 


TV A m 

AAT 


CTC 


TCC 


87 ^ Gl u 


Pro 


His 


A rg 


Gl V 


Tyr 


Hi s 


Hi s 


1 1 e 


Gl u 


Gl y 


/4 / IoLj 


CGG 


GAA 


CAG 




AAT 


ATC 


ACC 


CGG 


TCG 


GCA 


7fi^ P rn 


Pro 


Phe 


Leu 


Leu 


1 i e Asp 


Gl y 


P rn 


A rg 


Cys 


780 AAC 


AAA 


TTC 


TCG 


TCC 


CTG 


ATT 


'i'i'i' 


CAC 


CAC 


CCC 


65^ Val 


Phe 


Gl u A rg 


Gl y 


Gl n 


Asn 


Lys 


Val 


Val 


Gly 


813 CTG 


ACC 


GCG 


AAT 


GGT 


GAG 


ATT 


GAG 


AAT 


ATA 


ACC 


54^ Gl n 


Gl y A rg 


1 i e 


Thr 


Leu 


Asn 


Leu 


1 1 e 


Tyr 


Gl y 


846 TTT 


CAT 


TCC 


CAG 


CGG 


TCG 


GTC 


GAT 


AAA 


AAA 


ATC 


43^Lys 


Met 


Gly 


Leu 


Pro A rg Asp 


1 1 e 


Phe 


Phe Asp 
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Figure 7b (cont'd) 

879 GAG ATA ACC GTT GGC CTC AAT CGG CGT TAA ACC 

32^ Leu Tyr Gl y Asn Ala Gl u lie Pro Thr Leu Gl y 

912 CGC CAC GAG ATG GGC ATT AAA CGA GTA TCC CGG 

21^Ala Val Leu His Ala Asn Phe Ser Tyr Gl y Pro 

945 CAG CAG GGG ATC ATT TTG CGC TIC AGC CAT 

10^ Leu Leu Pro Asp Asn Gin Ala Gl u Ala Met 



975 


ACTTTTCATA 


CTCCCGCCAT 


TCAGAGAAGA 


AACCAATTGT 


1015 


CCATATTGCA 


TCAGACATTG 


CCGTCACTGC 


GTCTTTTACT 


1055 


GGCTCTTCTC 


GCTAACCAAA 


CCGGTAACCC 


CGCTTATTAA 


1095 


AAGCATTCTG 


TAACAAAGCG 


GGACCAAAGC 


CATGACAAAA 


1135 


ACGCGTAACA 


AAAGTGTCTA 


TAATCACGGC 


AGAAAAGTCC 


1175 


ACATTGATTA 


'ITiGCACGGC 


GTCACAC'i'iT 


GCTATGCCAT 








BamHI 




1215 


AGCATTTTTA 


TCCATAAGAT 


TAGCGGATCC 


TACCTGACGC 


1255 


TTTTTATCGC 


AACTCTCTAC 


TGTTTCTCCA 


TACCCGTTTT 




Nhel 


EcoRI 


Ncol 




1295 


TTTGGGCTAG 


CAGGAGGAAT 


TCACC ATG GAT CCC GTA 



l^Met Asp Pro Val 

1332 ATC GTA GAA GAC ATA GAG CCA GGT ATT TAT TAC 



S^lle Val Glu Asp Me Glu Pro Gl y Me Tyr Tyr 

1365 GGA ATT TCG AAT GAG AAT TAC CAC GCG GGT CCC 

16^Gly Me Ser Asn Glu Asn Tyr His Ala Gl y Pro 
1398 GGT ATC ACT AAG TCT CAG CTC GAT GAC ATT GCT 
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Figure 7b (cont'd) 

27^Gly lie Ser Lys Ser Gin Leu Asp Asp lie Ala 
1431 GAT ACT CCG GCA CTA TAT TIG TCG CGT AAA AAT 

38Msp Thr Pro Ala Leu Tyr Leu Trp A rg Lys Asn 
1464 GCC CCC GTG GAC ACC ACA AAG ACA AAA ACG CIC 

49Mla Pro Val Asp Thr Thr Lys Thr Lys Thr Leu 
1497 GAT TTA GGA ACT GCT TTC CAC TCC CGG-GTA CTT 

eO^Asp Leu Gly Thr Ala Phe His Cys A rg Val Leu 

EcoRI 

1530 GAA CCG GAA GAA TTC ACT AAC CGC TTT ATC GTA 

Vl^Glu Pro Glu Glu Phe Ser Asn A rg Phe lie Val 
1563 GCA CCT GAA TTT AAC CGC CGT ACA AAC GCC GGA 
82Mla Pro Glu Phe Asn A rg A rg Thr Asn Ala Gly 
1596 AAA GAA GAA GAG AAA GCG TTT CIG ATG GAA TCC 
93Kys Glu Glu Glu Lys Ala Phe Leu Met Glu Cys 
1629 GCA AGC ACA GGA AAA ACG GTT ATC ACT GCG GAA 
104KAIa Ser Thr Gly Lys Thr Val lie Thr Ala Glu 
1662 GAA GGC CGG AAA ATT GAA CIC ATC TAT CAA AGC 
115^ Glu Gly A rg Lys lie Glu Leu Met Tyr Gin Ser 
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Figure 7b (cont'd) 

1695 GTT ATG GCT TTG CCG 

126^Val Met Ala Leu Pro 

1728 GAA AGC GCC GGA CAC 

137^ Gl u Ser Ala Gl y Hi s 

1761 TGG GAA GAT CCT GAA 

148^Trp Glu Asp Pro Gl u 
1794 TGC CGT CCG GAC AAA 

159^Cys A rg Pro Asp Lys 

1827 TGG ATC ATG GAC GTG 

170^Trp I I e Mel Asp Val 

1860 CAA CGA TTC AAA ACC 

181^ Gin Arg Phe Lys Thr 
1893 TAT CAC GTT CAG GAT 

192^Tyr Hi s Val Gl n Asp 

1926 TAT GAA GCA CAG TTT 

203^Tyr Gl u Al a Gl n Phe 

1959 GTT TTT CTG GTT GCC 

214^ Val Phe Leu Val Ala 

1992 GGA CGT TAT CCG GTT 



CTG GGG CAA TGG CTT GTT 

Leu Gl y Gl n Trp Leu Val 
GCT GAA TCA TCA ATT TAC 

Al a Gl u Ser Ser I I e Tyr 

ACA GGA ATT TTG TGT CGG 

Thr Gly Me Leu Cys Arg 

ATT ATC CCT G?^ TTT CAC 

Me lie Pro Gl u Phe His 

AAA ACT ACG GCG GAT ATT 

Lys Thr Thr A I a Asp I I e 

GCT TAT TAC GAC TAC CGC 

Ala Tyr Tyr Asp Tyr Arg 
GCA TTC TAC AGT GAC GGT 

Ala Phe Tyr Ser Asp Gly' 

GGA GTG CAG CCA ACT TTC 

Gl y Val Gl n Pro Thr Phe 
AGC ACA ACT ATT GAA TGC 

Ser Thr Thr I I e Gl u Cys 

GAA ATT TTC ATG ATG GGC 
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Figure 7b (cont'd) 

225^Gly Arg Tyr Pro Val Gl u lie Phe Met Met Gl y 

2025 GAA GAA GCA AAA CTG GCA GGT CAA CAG GAA TAT 
236^Glu Glu Ala Lys Leu Ala Gl y Gin Gin Gl u Tyr 

2058 CAC CGC AAT CTG CGA ACC CTC TCT GAC lO: CTG 

247^ His Arg Asn Leu Arg Thr Leu Ser Asp Cys Leu 

Ball 

2091 AAT ACC GAT GAA TGG CCA GOT ATT AAG ACA TTA 

258^Asn Thr Asp Glu Trp Pro Ala Me Lys Thr Leu 
2124 TCA CTG CCC CGC TGG OCT AAG GAA TAT GCAA 

269KSer Leu Pro Arg Trp Ala Lys Glu Tyr AlaA 

2155 ATG ACT AAG CAA CCA CCA ATC GCA AAA GCC GAT 
l^Met Thr Lys Gin Pro Pro Me Ala Lys Ala Asp 
279^ s nAs p» • • 

2188 CTG CAA AAA ACT CAG GGA AAC CGT GCA CCA GCA 
12^Leu Gin Lys Thr Gin Gl y Asn Arg Ala Pro Ala 

2221 GCA GTT AAA AAT AGC GAC GTG ATT AGT TTT ATT 
23^Ala Val Lys Asn Ser Asp Val Me Ser Phe Me 

2254 AAC CAG CCA TCA ATG AAA GAG CAA CTG GCA GCA 
34^Asn Gin Pro Ser Met Lys Glu Gin Leu Ala Ala 

Ndel 

2287 GCT CTT CCA CGC CAT ATG ACG GOT GAA CGT ATC 
45^Ala Leu Pro Arg His Met Thr Ala Glu Arg Met 
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Figure 7b (cont'd) 
2320 ATC CGT ATC GCC ACC 
56^ I le Arg I le Ala Thr 

2353 CCG GCG TTA GGA AAC 
67> Pro Al a Leu Gl y Asn 

2386 GTC AGT GCG ATC GTA 
78^ Val Ser Ala lie Val 

2419 err GAG CCA GGT AGC 
89^ Leu Gl u Pro Gl y Ser 

2452 TTA CTG CCT TTT GGT 
100^ Leu Leu Pro Phe Gl y 

2485 GGT AAA AAG AAC GTT 
lll^Gly Lys Lys Asn Val 

2518 CGC GGC ATG ATT GAT 
122^Arg Gl y Met Me Asp 

2551 CAA ATC GCC AGC CTG 
133^ Gin I I e Ala Ser Leu 

2584 GAA GGT GAC GAG TTT 
144^Glu Gly Asp Gl u Phe 

2617 GAT GAA AAG TTA ATA 
155^Asp Gl u Lys Leu lie 

2650 GAA GAT GCC CCG GTT 
166^ Gl u Asp Ala Pro Val 

2683 GCA AGA CTG AAA GAC 
177^Ala Arg Leu Lys Asp 

2716 GTT ATG ACG CGC AAA 
188^ Val Met Thr A ra Lvs 



ACA GAA ATT CGT AAA GTT 
Thr Gl u II e A rg Lys Val 

TGT GAC ACT ATG AGT TIT 
Cys Asp Thr Met Ser Phe 

CAG TGT TCA CAG CTC GGA 
Gin Cys Ser Gin Leu Gly 

GCC CTC GGT CAT GCA TAT 
Al a Leu Gl y HI s Al a Tyr 

AAT AAA AAC GAA AAG AGC 
Asn Lys Asn Gl u Lys Ser 

CAG CTA ATC ATT GGC TAT 
Gl n Leu lie Me Gl y Tyr 

CTG GCT CGC CGT TCT GGT 
Leu Al a A rg A rg Ser Gl y 

TCA GCC CGT GTT GTC CGT 
Ser Ala Arg Val Val Arg 

AGC TTC GAA TTT GGC CTT 
Ser Phe Gl u Phe Gl y Leu 

CAC CGC CCG GGA GAA AAC 
HI s A rg Pro Gl y Gl u Asn 

ACC CAC GTC TAT GCT GTC 
Thr His Val Tyr Ala Val 

GGA GGT ACT CAG TTT GAA 
Gly Gly Thr Gin Phe Gl u 

CAG ATT GAG CTG GTG CGC 
Gl n Me Gl u Leu Val Arg 
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Figure 7b (cont'd) 

2749 AGC CTG AGT AAA GCT GGT AAT AAC GGG CCG TCG 
199^Ser Leu Ser Lys Ala Gl y Asn Asn Gl y Pro Trp 

2782 GTA ACT CAC TGG GAA GAA ATC GCA AAG AAA ACX3 
210^Val Thr His Trp Gl u Gl u Met Ala Lys Lys Thr 

2815 GCT ATT CGT CGC CTG TTC AAA TAT TIG CCC GTA 
221^Ala He Arg Arg Leu Phe Lys Tyr Leu Pro Val 

2848 TCA ATT GAG ATC CAG CGT GCA GTA TCA ATC GAT 
232^ Ser I I e Gl u I I e Gl n A rg A! a Va I Ser Met Asp 

Psti 

2881 GAA AAG GAA CCA CTG ACA ATC GAT CCT GCA GAT 
243^Glu Lys Gl u Pro Leu Thr lie Asp Pro Ala Asp 

2914 TCC TCT GTA TTA ACC GGG GAA TAC AGT GTA ATC 
254^ Ser Ser Val Leu Thr Gl y Gl u Tyr Ser Val Me 

Bglll Hindlll 

2947 GAT AAT TCA GAG GAA TAG ATCTAAGCTT 
265^Asp Asn Ser Gl u Gl u ••• 

2975 GGCTGTTTTG GCGGATGAGA GAAGATTTTC AGCCTGATAC 

3015 AGATTAAATC AGAACGCAGA AGCGGTCTGA TAAAACAGAA 

3055 TTTGCCTGGC GGCAGTAGCG CGGTCGTCCC ACCTCACCCC 

3095 ATGCCGAACT CAGAAGTGAA ACGCCGTAGC GCCGATGGTA 

3135 GTGTGGGGTC TCCCCATGCG AGAGTAGGGA ACTGCCAGGC 

3175 ATCAAATAAA ACGAAAGGCT CAGTCGAAAG ACTGGGCCTT 

3215 TCGTTTTATC TGTTGTTTGT CGGTCAACGC TCTCCTCAGT 

3255 AGGACAAATC CGCCGGGAGC GGATTTCAAC GTTCCGAAGC 

3295 AACGGCCCGG AGGGTGGCGG GCAGGACGCC CGCCATAAAC 

3335 TGCCAGGCAT CAAATTAAGC AGAAGGCCAT CCTCACGGAT 



suBsnruTE sheet (rule 26) 



wo 99/29837 



PCT/EP98/07945 



20/65 

Figure 7b (cont'd) 

3375 GGCCTTTTTG CGTTTCTACA AACTCTTTTG TTTATTTTTC 

3415 TAAATACATT CAAATATGTA TCCGCTCATC AGACAATAAC 

3455 CCTGATAAAT GCTTCAATAA TATTGAAAAA GGA^^GAGT AT 

l^Me 

3495 G AGT ATT CAA CAT TTC CGT GTC GCC CTT ATT 
l^t Ser Me Gin His Phe A rg Val Ala Leu Me 

3526 CCC TTT TTT GCG GCA TTT TGC CTT CCT GTT TIT 
12 ►Pro Phe Phe Ala Ala Phe Cys Leu Pro Val Phe 

3559 GCT CAC CCA GAA ACG CTG GTG AAA GTA AAA GAT 
23^Ala His Pro Gl u Thr Leu Val Lys Val Lys Asp 

3592 GCT GAA GAT CAG TTG GGT GCA CGA GTG GGT TAC 
34^Ala Glu Asp Gin Leu Gly Ala Arg Val Gl y Tyr 

3625 ATC GAA CTG GAT CTC AAC AGC GGT AAG ATC CTT 
45^ Me Glu Leu Asp Leu Asn Ser Gly Lys Me Leu 

3658 GAG AGT TTT CGC CCC GAA GAA CGT TTT CCA ATG 
56^Glu Ser Phe Arg Pro Glu Glu Arg Phe Pro Met 

3691 ATG AGC ACT TTT AAA GTT CTG CTA TGT GGC GCG 
67^Met Ser Thr Phe Lys Val Leu Leu Cys Gly Ala 

3724 GTA TTA TCC CGT GTT GAC GCC GGG CAA GAG CAA 
78^Val Leu Ser Arg Val Asp Ala Gly Gin Glu Gin 

3757 CTC GGT CGC CGC ATA CAC TAT TCP CAG AAT GAC 
89^Leu Gly Arg Arg II e Hi s Tyr Ser Gin Asn Asp 

Seal 

3790 TTG GTT GAG TAC TCA CCA GTC ACA GAA AAG CAT 
100^ Leu Val Glu Tyr Ser Pro Val Thr Glu Lys His 

3823 CTT ACG GAT GGC ATG ACA GTA AGA GAA TTA TCC 
111^ Leu Thr Asp Gly Met Thr Val Arg Glu Leu Cys 
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Figure 7b (cont'd) 




















3856 AGT 


GOT 








Aivj 






AAC 


ACi 


oCLj 


122 ►Ser 


Ala 


Ala 


1 1 e 


Thr 


Met 


Ser 


Asp 


Asn 


Thr 


Al a 


3889 GCC 


AAC 


TTA 


CI i 




ACA 


ACG 


ATC 


GGA 


GGA 




133^AI a 


Asn 


Leu 


Leu 


Leu 


Thr 


Thr 


i 1 e 


Gl V 


Gly 


Pro 


3922 AAG 


GAG 




ACL, 




i i 1 


TTG 


CAC 


AAL 


ATG 


GGG 


144 ►Lys 


Ql u 


Leu 

b W Wff 


Thr 


Al a 


Phe 


Leu 


His 


Asn 


Met 


Gly 


3955 GAT 


CAT 


GTA 


TV /"ini 

ACT 


GGC 


CTT 


GAT 


CGT 


TGG 


GAA 


CCG 


155^Asp 


Hi s 


Val 


Thr 


A ra 


Leu 

w U 


Asp A rg 


Tro 


Gl u 


Pro 


3988 GAG 


CTG 


AAT 


GAA 


GCC 


ATA 


CCA 


AAC 


GAC 


GAG 


CGT 


166^Glu 


Leu 


A ^ n 


Gl u 


Al a 


1 1 e 


Pro 


Asn 


A^n 


Gl u A rg 


4021 GAG 


AGO 


ACG 


ATG 


CCT 


GTA 


GCA 


ATG 


GCA 


ACA 


ACG 


177^Asp 


Thr 


Thr 

1 1 1 1 


Met 


P ro 


Va 1 

V U 1 


Al a 


Met 


Al a 


Thr 


Thr 


4054 TTG 


CGC 


AAA 


CTA 


TTA 


ACT 


GGC 


GAA 


CTA 


CTT 


ACT 


188^ Leu 


A rg 




1 ptj 

L. w LI 


I pij 

L. w U 


Thr 

1 1 1 1 


Gly 


Gl u 


1 PIJ 

^ w u 


Leu 


Thr 


4087 CTA 


GCT 


TCC 


CGG 


CAA 


CAA 


TTA 


ATA 


TV /~1 

GAC 


TGG 


ATG 


199^ Leu 


Al a 


Ser 


A ra 


Gl n 


Gl n 


Leu 


1 1 e 


Asp 


Trp 


Met 


4120 GAG 


GCG 


GAT 


TV TV "TV 

AAA 


GTT 


GCA 


GGA 


CCA 


CTT 


CTG 


CGC 


210^GI u 


Ai a 






Val 


Al a 


Gly 


Pro 


Leu 

W U 


Leu 


A rg 






CTT 


CCG 


GCT 


GGC 


TGG 


'i'i'i' 


ATT 


GCT 


GAT 


221^ Ser 


Al a 


Leu 


r ro 




vji y 


Trp 


Phe 


1 1 o 
1 1 6 


Al a Asp 


4186 AAA 


TCT 


GGA 


GCC 


GGT 


GAG 


CGT 


GGG 


TCT 


CGC 


GGT 


232^ Lys 


Ser 


Gly 


Al a 


Gly 


Gl u 


A rg 


Gly 


Ser 


A rg 


Gly 


4219 ATC 


ATP 


GCA 


GCA 


CTG 


GGG 


CCA 


GAT 


GGT 


AAG 


ccc 


243^ 1 1 e 


1 1 e 


Ai a 


Al a 


Leu 


Gly 


Pro Asp 


Gly 


Lys 


Pro 


4252 TCC 


CGT 


ATC 


GTA 


GTT 


ATC 


TAG 


ACG 


ACG 


GGG 


AGT 


254^ Ser 


A rg 


1 1 e 


Val 


Val 


1 1 e 


Ty r 


Thr 


Thr 


Gly 


Ser 
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Figure 7b (cont'd) 

4285 CAG GCA ACT ATG GAT GAA CGA AAT AGA CAG ATC 
265^Gln Ala Thr Met Asp Gl u A rg Asn A rg Gl n Me 

4318 GOT GAG ATA GGT GCC TCA CTC ATT AAG CAT TGG 
276^Ala Glu I le Gly Ala Ser Leu I le Lys His Trp 

4351 TAA CTGTCAGACC AAGTTTACTC ATATATACTT 
287^» • • 

4384 TAGATTGATT TACGCGCCCT GTAGCGGCGC ATTAAGCGCG 
4424 GCGGGTGTGG TGGTTACGCG CAGCGTGACC GCTACACTTC 
4464 CCAGCGCCCT AGCGCCCGCT CCTTTCGCTT TCTTCCCTTC 
4504 CTTTCTCGCC ACGTTCGCCG GCTTTCCCCG TCAAGCTCTA 
4544 AATCGGGGGC TCCCTTIAGG GTTCCGATTT AGTGCTTTAC 
4584 GGCACCTCGA CCCCAAAAAA CTTGATTTGG GTGATGGTTC 
4624 ACGTAGTGGG CCATCGCCCT GATAGACGGT TTTTCGCCCT 
4664 TTGACGTTGG AGTCCACGTT CTTTAATAGT GGACTCTTGT 
4704 TCCAAACTTG AACAACACTC AACCCTATCT CGGGCTATTC 
4744 TTTTGATTTA TAAGGGATTT TGCCGATTTC GGCCTATTGG 
4784 TTAAAAAATG AGCTGATTTA ACAAAAATTT AACGCGAATT 
4824 TTAACAAAAT ATTAACGTTT ACAATTTAAA AGGATCTAGG 
4864 TGAAGATCCT TTTTGATAAT CTCATGACCA AAATCCCTTA 
4904 ACGTGAGTTT TCGTTCCACT GAGCGTCAGA CCCCGTAGAA 
4944 AAGATCAAAG GATCTTCTTG AGATCCTTTT TTTCTGCGCG 
4984 TAATCTGCTG CTTGCAAACA AAAAAACCAC CGCTACCAGC 
5024 GGTGGTTTGT TTGCCGaATC AAGAGCTACC AACTCTTTTT 
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Figure 7b (cont'd) 

5064 CCGAAGGTAA CTGGCTTCAG CAGAGCGCAG ATACCAAATA 
5104 CTGTCCTTCT AGTGTAGCCG TAGTTAGGCC ACCACTTCAA 
5144 GAACTCTGTA GCACCGCCTA CATACCTCGC TCTGCTAATC 
5184 CTGTTACCAG TGGCTGCTGC CAGIGGCGAT AAGTCGTCTC 
5224 TTACCGGGTT GGACTCAAGA CGATAGTTAC CGGATAAGGC 
5264 GCAGCGGTCG GGCTGAACGG GGGGTTCG1G CACACAGCCC 
5304 AGCTTGGAGC GAACGACCTA CACCGAAC1G AGATACCTAC 
5344 AGCGTGAGCT ATGAGAAAGC GCCACGCTTC CCGAAGGGAG 
5384 AAAGGCGGAC AGGTATCCGG TAAGCGGCAG GGTCGGAACA 
5424 GGAGAGCGCA CGAGGGAGCT TCCAGGGGGA AACGCCTGGT 
5464 ATCTTTATAG TCCTGTCGGG TTTCGCCACC TCTGACTTCA 
5504 GCGTCGATTT TTGTGATGCT CGTCAGGGGG GCGGAGCCTA 
5544 TGGAAAAACG CCAGCAACGC GGCCTTTTTA CGGTTCC1GG 
5584 CCTTTTGCTG GCCTTTTGCT CACATGTTCT TTCCTGCGTT 
5624 ATCCCCTGAT TCTGTGGATA ACCGTATTAC CGCCTTIGAG 
5664 TGAGCTGATA CCGCTCGCCG CAGCCGAACG ACCGAGCGCA 
5704 GCGAGTCAGT GAGCGAGGAA GCGGAAGAGC GCCTGATGCG 
5744 GTATTTTCTC CTTACGCATC TGTGCGGTAT TTCACACCGC 
5784 ATAGGGTCAT GGCTGCGCCC CGACACCCGC CAACACCCGC 
5824 TGACGCGCCC TGACGGGCTT GTCTGCTCCC GGCATCCGCT 
5864 TACAGACAAG CTGTGACCGT CTCCGGGAGC 'TCCA'IGTGTC 
5904 AGAGGTTTTC ACCGTCATCA CCGAAACGCG CGAGGCAGCA 
5944 AGGAGATGGC GCCCAACAGT CCCCCGGCCA CGGGGCCTGC 
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Figure 7b (cont'd) 

5984 CACCATACCC ACGCCQAAAC AAGCGCTCAT GAGCCCGAAG 
6024 TGGCGAGCCC GATCTTCCCC ATCGGTCATC TCGGCGATAT 
6064 AGGCGCCAGC AACCGCACCT GTGGCGCCGG TGATGCCGGC 
6104 CACGATGCGT CCGGCGTAGA GGATCTCCTC ATCTTTCACA 
6144 GCTTATC 
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Figure 8 b 
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Figure 9a 
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Figure 9 
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Figure 10a 
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Figure 10 
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Figure 1 la 
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Figure 1 1 b 
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Figure 12 
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Figure 13 a 
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Figure 13b 

1 ATCGATGCATAATGTGCCTGTCAAATGGACGAAGCAGGG 
40 ATTCTGCAAACCCTATGCTACTCCGTCAAGCCGTCAATT 

79 GTCTGATTCGTTACCAA TTA TGA CAA CTT GAC 

293<*»* Ser Leu Lys Val 

111 GGC TAG ATC ATT CAC TTP TTC TTC ACA ACC 
288^Ala Val Asp Asn Val Lys Gl u Gl u Cys Gl y 

141 GGC ACG GAA CTC GOT CGG GCT GGC CCC GGT 
278^Ala Arg Phe Gl u Ser Pro Ser Ala Gl y Thr 

171 GCA TIT TTT AAA TAC CCG CGA GAA ATA GAG 
268^ Cys Lys Lys Phe Val Arg Ser Phe Tyr Leu 

201 TTG ATC GTC AAA ACC AAC ATT GCG ACC GAC 
258^ Gin Asp Asp Phe Gl y Val Asn Arg Gl y Val 

231 GGT GGC GAT AGG CAT CCG GGT GGT GCT CAA 
248^Thr Ala lie Pro Met Arg Thr Thr Ser Leu 

261 AAG CAG CTT CGC CTG GCT GAT ACG TTG GTC 
238^ Leu Leu Lys Ala Gin Ser i I e A rg Gl n Asp 

291 CTC GCG CCA GCT TAA GAC GCT AAT CCC TAA 
228^ Glu Arg Trp Ser Leu Val Ser I I e Gl y Leu 

321 CTG CTG GCG GAA AAG ATG TGA CAG ACG CGA 
218^ Gin Gin Arg Phe Leu His Ser Leu Arg Ser 

351 CGG CGA CAA GCA AAC ATG CTG TGC GAC GCT 
208^ Pro Ser Leu Cys Val Hi s Gl n Ala Val Ser 

381 GGC GAT ATC AAA ATT GCT GTC TGC CAG GTG 
198^Ala lie Asp Phe Asn Ser Asp Ala Leu His 

411 ATC GCT GAT GTA CTG ACA AGC CTC GCG TAC 
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Figure 13b (cont'd) 

188^Asp Ser lie Tyr Gin Cys Ala Gl u A rg Val 

441 CCG ATT ATC CAT CGG TGG ATC GAG CGA CTC 
178^Arg Asn Asp Met Pro Pro His Leu Ser Gl u 
471 GTT AAT CGC TIC CAT GCG CCG CAG TAA CAA 
168^Asn lie Ala Glu Met Arg Arg Leu Leu Leu 

501 TTG CTC AAG CAG ATT TAT CGC CAG CAG CTC 
158^ Gin Glu Leu Leu Asn lie Ala Leu Leu Glu 

531 CGA ATA GCG CCC TTC CCC TTC CCC GGC GTT 
148^ Ser Tyr A rg Gl y Gl u Gl y Gl n Gl y Al a Asn 
561 AAT GAT TTG CCC AAA CAG GTC GCT GAA ATG 
138^ Me lie Gin Gl y Phe Leu Asp Ser Phe His 

591 CGG CTG GTG CGC TTC ATC CGG GCG AAA GAA 
128^ Pro Gin His Ala Glu Asp Pro Arg Phe Phe 

621 CCC CGT ATT GGC AAA TAT 1GA CGG CCA GTT 
llS^Gly Thr Asn Ala Phe lie Ser Pro Trp Asn 

651 AAG CCA TTC ATC CCA GTA GGC GCG CGG ACG 
108^ Leu Trp Glu. His Trp Tyr Ala Arg Pro Arg 

681 AAA GTA AAC CCA CTC GTC ATA CCA TTC GCG 
98^Phe Tyr Val Trp Gin His Tyr Trp Glu Arg 

711 AGC CTC CGG ATC ACG ACC GTA GTC ATC AAT 
88^Ala Glu Pro His Arg Gl y Tyr His His Me 

741 CTC TCC TGG CGG GAA CAG CAA AAT ATC ACC 
78^ Glu Gly Pro Pro Phe Leu Leu Me Asp Gl y 

771 CGG TCG GCA AAC AAA TTC TCG TCC CTC ATT 
68^ Pro Arg Cys Val Phe Gl u A rg Gly Gin Asn 
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Figure 13b (cont'd) 

801 CAC CAC CCC CTG ACC GCG AAT GGT GAG 

58<^Lys Val Val Gl y Gin Gl y A rg lie Thr Leu 

831 ATP GAG AAT ATA ACC TIT CAT TCC CAG CGG 
48^Asn Leu Me Tyr Gl y Lys Met Gl y Leu Pro 

861 TCG GTC GAT AAA AAA ATC GAG ATA ACC GTT 
38^Arg Asp lie Phe Phe Asp Leu Tyr Gl y Asn 

891 GGC CTC AAT CGG CGT TAA ACC CGC CAC CAG 
28iAla Glu lie Pro Thr Leu Gl y Ala Val Leu 

921 ATC GGC ATT AAA CGA GTA TCC CGG CAG CAG 
18^ His Ala Asn Phe Ser Tyr Gl y Pro Leu Leu 

951 GGG ATC ATT TTC CGC TTC AGC CAT ACTTTTC 
8^ Pro Asp Asn Gin Ala Glu Ala Met 

982 ATACTCCCGCCATTCAGAGAAGAAACCAATTCTCCATAT 
1021 TGCATCAGACATTCCCGTCACTCCGTCTTTTACTGGCTC 
10^0 TTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGC 
^^^^ ATTCTCTAACAAAGCGGGACCAAAGCCATCACAAAAACG 
^^^^ CGTAACAAAAGTCTCTATAATCACGGCAGAAAAGTCCAC 
^^'^'7 ATTGATTATTTGCACGGCGTCACACTTTCCTATCCCATA 

1216 GCATTTTTATCCATAAGATTAGCGGATCCTACCTCACGC 
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Figure 13b (cont'd) 

1255 TTTTTATCGCAACTCTCTACIXSTTTCTCCATACC^^ 
Nhel EcoRI Ncol BamHI 

1294 TTTTGGGCTAGCAGGAGGAAT TCACC AIG GAT CCC • 

l^Met Asp Pro 

1329 GTA ATC GTA GAA GAC ATA GAG CCA GGT ATT 
4^Val lie Val Gl u Asp Me Gl u Pro Gl y Me 

1359 TAT TAC GGA ATT TCG AAT GAG AAT TAC CAC 

14^Tyr Tyr Gly lie Ser Asn Gl u Asn Tyr His 
1389 GCG GGT CCC GGT ATC AGT AAG TCT CAG CTC 

24Mla Gly Pro Gly Me Ser Lys Ser Gin Leu 
1419 GAT GAC ATT GCT GAT ACT CCG GCA CTA TAT 

34^Asp Asp Me Ala Asp Thr Pro Ala Leu Tyr 
1449 TIG TGG CGT AAA AAT GCC CCC GTG GAC ACC 

44^Leu Trp A rg Lys Asn Ala Pro Val Asp Thr 
1479 ACA AAG ACA AAA ACG CTC GAT TTA GGA ACT 

54^Thr Lys Thr Lys Thr Leu Asp Leu Gly Thr 
1509 GCT TTC CAC TGC CGG GTA CTT GAA CCG GAA 

64^Ala Phe His Cys Arg Val Leu Gl u Pro Gl u 
EcoRI 

1539 GAA TTC AGT AAC -CGC TTT ATC GTA GCA CCT 
74^Glu Phe Ser Asn Arg Phe Me Val Ala Pro 

1569 GAA TIT AAC CGC CGT ACA AAC GCC GGA AAA 
84^Glu Phe Asn A rg A rg Thr Asn Ala Gly Lys 

1599 GAA GAA GAG AAA GCG TIT CTC ATC GAA TGC 
94rGlu Glu Glu Lys Ala Phe Leu Met Gl u Cys 

^^n!^A^ AGC ACA GGA AAA ACG GTT ATC ACT GCG 
104^Ala Ser Thr Gly Lys Thr Val Me Thr Ala 
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Figure 13b (cont'd) 

^lllT. S!^ f° ATT GAA CIC ATG 
ll4^Glu Glu Gly Arg Lys lie Gl u Leu Met Tyr 

^lll ^ <^ ™ CCG CTC GGG CAA 

124^ Gin Ser Val Me, Ala Leu Pro Leu Gly Gin 

°7 AGC or GGA CAC GOT GAA 

134^Trp Leu Val Glu Ser Al a Gl y HI s Al a Gl u 

144^ Ser Ser lie Tyr Trp Glu Asp Pro Glu Thr 

"^►^ f ^ ™3 TOT OGG TCC OST CCG GAC AAA 
154^ Gly lie Leu Cys Arg Cys Arg Pro Asp Lys 

^lel f ?! f GAA TIT CAC TOG ATC ATO GAC 
164Mle lie Pro Glu Phe His Trp lie Me, Asp 

74% ^f^^^"^ '^'^ ATT CAA CGA TIC 

174>Val Lys Thr Thr Al a Asp I I e Gl n A rg Phe 

ACC GCT TAT TAC GAC TAG CGC OAT CAC 
184Kys Thr Ala Tyr Tyr Asp Tyr Arg Tyr Sfs 

1899 GTT CSG GAT GCA TIC TAC AGT GAC GGT TAT 
194>Val Gin Asp Ala Phe Tyr Ser Asp oTy 

^^'^ TIT GGA GTC CAG CCA ACT TTC 
204^Glu Ala Gin Phe Gly Val Gl n ^ T^ ^ 

^"^7^^ ^ ATT GAA 

214>Val Phe Leu Val Ala Ser Thr Thr lie Glu 

"2!^ ^ T f '^^'^ GAA ATT TIC ATO 

224>Cys Gly Arg Tyr Pro Val Glu lie Phe Met 

%^^!^f? «^?^GAAGCAAAACTOGCAGGTCAA 
234KMet Gly Glu Glu Ala Lys Leu Ala Gly Gin 
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Figure 13b (cont'd) 

2049 CAG GAA TAT Par nnn ■^v.m 

^r.^.^ z s s s 

2139 GCT AAG GAA Ta^r nn?, ^.^rr. ^pnl 
274M,a L,s ^ ^^-^-^-^ 

2171 GTACCCGAGCACGTGTTCACAArTAATCATCGGCATAGT 
2210 ATATCGGCATAGTATAATACGACAAGGTCAGGAACTAAA 

CAA CCA CCA ATC GCA AM 
l>Mot Ala Lys Gin Pro Pro I I e ^ 

- - - « ^ MC 

2338 AW ACT TIT ATT AAC CAG CCA ICA ATC AAA 
30Mle Ser Phe He Asn Gin Pro Ser Me , ^ 

''40.^^ ^ COT CC Cac CAT 

Leu Ala Ala Ala Leu Pro A rg His 

r ffj s - - f?j - 
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Figure 1 3b (cont'd) 

2458 GGA AAC TGT GAC ACT ATC AGT TIT GTC AGT 
70^Gly Asn Cys Asp Thr Met Ser Phe Val Ser 

2488 GCG ATC GTA CAG TCT TCA CAG CTC GGA CTT 
80Mla Me Val Gin Cys Ser Gin Leu Gl y Leu 

2518 GAG CCA GGT AGC GCC CIC GGT CAT GCA TAT 
90^Glu Pro Gly Ser Ala Leu Gl y His Ala Tyr 

2548 TPA CTC CCT TIT GGT AAT AAA AAC GAA AAG 
lOO^Leu Leu Pro Phe Gly Asn Lys Asn Gl u Lys 

2578 AGC GGT AAA AAG AAC GTT CAG CPA ATC ATT 
llO^Ser Gly Lys Lys Asn Val Gin Leu Me Me 

2608 GGC TAT CGC GGC ATC ATT GAT CTC GCT CGC 
120^ Gly Tyr A rg Gly Met Me Asp Leu Ala Arg 

2638 CGT TCT GGT CAA ATC GCC AGC CTC TCA GCC 
ISO^Arg Ser Gly Gin Me Ala Ser Leu Ser Ala 

2668 CGT GTT GTC CGT GAA GGT GAC GAG TIT AGC 
140^ Arg Val Val Arg Gl u Gly Asp Gl u Phe Ser 

2698 TTC GAA TTT GGC CTT GAT GAA AAG TTA ATA 
150^ Phe Glu Phe Gly Leu Asp Gl u Lys Leu Me 

2728 CAC CGC CCG GGA GAA AAC GAA GAT GCC CCG 
leo^His Arg Pro Gly Glu Asn Glu Asp Ala Pro 

2758 GTT ACC CAC GTC TAT GCT GTC GCA AGA CTC 
170^Val Thr His Val Tyr Ala Val Ala Arg Leu 

2788 AAA GAC GGA GGT ACT CAG TTT GAA GTT ATC 
ISO^Lys Asp Gly Gly Thr Gin Phe Glu Val Met 

2818 ACG CGC AAA CAG ATT GAG CTC GTC CGC AGC 
ISO^Thr Arg Lys Gin Me Glu Leu Val Arg Ser 
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Figure 13b (cont'd) 

200^ Leu Ser Lys Ala Gly Asn Asn Gl y P ro 

210>Val Thr His Trp Gl u Gl u Met Ala Lys Lys 

ff^ ATT CX3T CGC era ITC AAA TAT TIG 
220^T1ir Ala 1 1 e A rg A rg Leu Phe Lys Tyr Leu 

^lllk P^^ v^'t ^ ATT GAG ATC GAG CGT GCA GTA 
230>Pro Val Ser He Gl u lie Gin A rg Ala Vaf 

2968 TCA ATG GAT GAA AAG GAA CCA CTC ACA Air 
240^ Ser Me, Asp Gl u Lys Gl u Pro Le^ f^^ 

^P^n^f^ per GCA GAT TCC TCT GTA TPA ACC GGG 
250Msp Pro Ala Asp Ser Ser Val Leu Thr oTy 

\°6n>^ T^^ AAT TCA GAG GAA 

260>'Glu Tyr Ser Val lie Asp Asn Ser Gl u Gl u 

Bglll HIndlll 
^270»Jt': "^^^^^^CTCKTCAACATCAAAGGCAAGAAA 

3096 ACATCTSTrGTCAAAQACAGCATCCITCAACAAGGACAA 

3135 TTAACAGTTAACAAATAAAAACGCAAAAGAAAATCCCGA 

3174 TATCCTATTOGCATTTrCTTTTATTTCTTATCAACATAA 

Xhol 

3213 AGGTGAATCCCATACCTCGAGCTOCACGCTCCCGCAAGC 
ACTCAGGGCGCAAGGGCTCCTAAAAGGAAGCGGAACACG 
TAGAAAGCCAGTCCGCAGAAACGGTCCTCACCCCGGATC 
3330 AATGTCAGCTACTGGGCTATCTCGACAAGGGAAAACGCA 
3369 AGCGCAAAGAGAAAGCAGGTAGCTTCCAGTCGGCrrACA 



3252 
3291 
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Figure 13b (cont'd) 

3408 TGGCGATAGCTAGACTGGGCGGTTTTATCGACAGCAAGC 

3447 GAACCGGAATTGCCAGCTGGGGCGCCCTCTGGTAAGGTT 

3486 GGGAAGCCCTGCAAAGTAAACTGGATGGCTTTCTTCCCG 

Bgill 

3525 CCAAGGATCTGATGGCGCAGGGGA'KrAAGATCTGATCAA 

3564 GAGACAGGATGAGGATCGTTTCGC ATC GAT ATT 

l^Met Asp I I e 

3597 AAT ACT GAA ACT GAG ATC AAG CAA AAG CAT 
4^Asn Thr Glu Thr Gl u lie Lys Gin Lys His 

3627 TCA CTA ACC CCC TTT CCT GTT TTC CTA A1C 
14^Ser Leu Thr Pro Phe Pro Val Phe Leu lie 

3657 AGC CCG GCA TTT CGC GGG CGA TAT TIT CAC 
24^Ser Pro Ala Phe A rg Gl y A rg Tyr Phe His 

3687 AGC TAT TTC AGG AGT TCA GCC ATG AAC GCT 
34^Ser Tyr Phe A rg Ser Ser Ala Met Asn Ala 

3717 TAT TAC ATT CAG GAT CGT CTT GAG GCT CAG 
44^Tyr Tyr Me Gin Asp A rg Leu Glu Ala Gin 

3747 AGC TGG GCG CGT CAC TAC CAG CAG CTC GCC 
54^Ser Trp Ala Arg His Tyr Gin Gin Leu Ala 

3777 CGT GAA GAG AAA GAG GCA GAA CTG GCA GAC 
64^Arg Glu Glu Lys Glu Ala Glu Leu Ala Asp 

3807 GAC ATG GAA AAA GGC CTG CCC CAG CAC CIG 
74^Asp Met Glu Lys Gl y Leu Pro Gin His Leu 

3837 TTT GAA TCG CTA TGC ATC GAT CAT TIG CAA 
84^Phe Glu Ser Leu Cys Me Asp His Leu Gin 

3867 CGC CAC GGG GCC AGC AAA AAA TCC ATT ACC 
94^Arg His Gl y Ala Ser Lys Lys Ser Me Thr 
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Figure 1 3b (cont'd) 

3897 OCT GCG TIT QAT GAC GAT GIT GAG TIT Cas 
104MrgAla Phe Asp Asp Asp Val Gl u SI 

^iVa^'^ cgc atg gca gaa cac atc cgg lac atc 

114^Glu Arg Met Ala Gl u His lie Arg Tyr Me? 

^ITik^ ?^ ^CC ATT GOT CAC CAC CAG GTT GAT 
124^Val Glu Thr lie Ala His His Gin Val Asp 

'L'L f ^ ^ AACGAGTAG^' A<!i!:T 

134»lle Asp Ser Glu Val ••• 



4019 TGGCTGTTTTGGCGGATGAGAGAAGATriTCAGCCrciAT 
4058 ACAGATTAAATCAGAACGCAGAAGCGGTCraATAAAACA 
4097 GAATTT3CCTGGCGGCAGTAGCGCGGTCGTCCCACCTCA 
4136 CCCCATGCCGAACTCAGAAGTCaAACGCCGTAGCGCCGA 
4175 TGGTAGTGTGGGGTCTCCCCATCCGAGAGTAGGQAACTC 
4214 CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACT 
4253 GGGCCTrrCGTrTTATCTGrrcTTTCTCGGTCAACGCTC 
4292 TCCTGAGTAGGACAAATCCGCCGGGAGCGGATITCAACG 
4331 TTGCGAAGCAACGGCCCGGAGGGTCGCGGGCAGGACGCC 
4370 CGCCATAAACTGCCAGGCATCAAATTAAGCAGAAGGCCA 
4409 TCCTGACGGATGGCCTmTCCGTITCrACAAACTCTrr 
4448 TGrrrATTTTTCTAAATACATTCAAATATCTATCCGCTC 
4487 ATGAGACAATAACCCTGATAAATCCTTCAATAATAITCA 
4526 AAAAGGAAGAGT ATG AGT ATT CAA CAT TIC 

l^Met Ser I I e Gl n Hi s Phe 
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Figure 13b (cont'd) 
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Figure 13b (cont'd) 

4946 CTG ACA ACG ATC GGA GGA CCG AAG GAG CTA 
137^Leu Thr Thr I I e Gl y G| y Pro Lys Gl u Leu 

4976 ACC GCT TIT TTG CAC AAC ATC GGG GAT CAT 
147^ Thr Ala Phe Leu His Asn Met Gl y Asp His 

^°c^l.^f^ CGC err GAT CGT TCG GAA CCG GAG 

157^Val Thr A rg Leu Asp A rg Trp Gl u Pro Gl u 

5036 CTG AAT GAA GCC ATA CCA AAC GAC GAG CGT 
167^Leu Asn Glu Ala Me Pro Asn Asp Gl u A rg 

5066 GAC ACC ACG ATC CCT GTA GCA ATC GCA ACA 
177^Asp Thr Thr Met Pro Val Ala Met Ala Thr 

5096 ACG 1TC CGC AAA CTA TTA ACT GGC GAA CTA 
187^Thr Leu A rg Lys Leu Leu Thr Gl y Glu Leu 

5126 CTT ACT CTA GCT TCC CGG CAA CAA TTA ATA 
197^ Leu Thr Leu Ala Ser A rg Gin Gin Leu lie 

5156 GAC TGG ATC GAG GCG GAT AAA GTT GCA GGA 
207^Asp Trp Met Glu Ala Asp Lys Val Ala Gl y 

5186 CCA CTT CTC. CGC TCG GCC CTT CCG GCT GGC 
217^ Pro Leu Leu A rg Ser Ala Leu Pro Ala Gl y 

5216 TGG TTT ATT GCT GAT AAA TCT GGA GCC GGT 
227^Trp Phe Me Ala Asp Lys Ser Gl y Ala Gl y 

5246 GAG CGT GGG TCT CGC GGT ATC ATT GCA GCA 
237^ Glu Arg Gl y Ser A rg Gl y Me Me Ala Ala 

5276 CTC GGG CCA GAT GGT AAG CCC TCC CGT ATC 
247^ Leu Gl y Pro Asp Gl y Lys Pro Ser Arg Me 

5306 GTA GTT ATC TAC ACG ACG GGG AGT CAG GCA 
257^ Val Val Me Tyr Thr Thr Gl y Ser Gin Ala 
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Figure 1 3b (cont'd) 

5336 ACT ATG GAT GAA CGA AAT AGA CAG ATC GOT 
267>Thr Met Asp Glu Arg Asn Arg Gin lie Ala 

5366 GAG ATA GGT GCC TCA CTG ATT AAG CAT 1GG 
277^ Glu Me Gly Ala Ser Leu I I e Lys His Trp 

5396 TAA CTGTCAGACCAAGTTTACTCATATATACTTTAGAT 

5434 TGATTTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGG 
5473 GTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCA 
5512 GCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCT 
5551 TTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAA 
5590 ATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTAC 
5629 GGCACCTCGACCCCAAAAAACTTGATTTGGGTGATGGTT 
5668 CACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCC 
5707 CTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCT 
5746 TGTTCCAAACTTGAACAACACTCAACCCTATCTCGGGCT 
5785 ATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCT 
5824 ATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACG 
5863 CGAATTTTAACAAAATATTAACGTTTACAATTTAAAAGG 
5902 ATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAA 
5941 ATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGAC 
5980 CCCGTAGAAAAGATCAAAGGATCTTCTTCAGATCCTTTT 
6019 TTTCTGCGCGTAATCTGCTGCTTCCAAACAAAAAAACCA 
6058 CCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTA 
6097 CCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCG 
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6136 CAGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTA 
6175 GGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATAC 
6214 CTCGCTCTGCTAATCCTGTTACCAGTCGCTCCTCCCAGT 
6253 GGCGATAAGTCGTGTCTTACCGGGTTCGACTCAAGACGA 
6292 TAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGG 
6331 GGTTCGTGCACACAGCCCAGCTTCGAGCGAACGACCTAC 
6370 ACCGAACTGAGATACCTACAGCGTCAGCTATCAGAAAGC 
6409 GCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCG 
6448 GTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAG 
6487 CTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTC 
6526 GGGTTTCGCCACCTCTGACTTGAGCGTCGAITTTTC 
6565 TGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGC 
6604 AACGCGGCCTTTTTACGGTTCCTGGCCTTTTCCITO 
6643 TTTGCTCACATCTTCTTTCCTGCGTTATGCCCTGATO 
6682 GTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACC 
6721 GCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAG1G 
6760 AGCGAGGAAGCGGAAGAGCGCCTGATCCGGTATTTTCTC 
6799 CTTACGCATCTGTGCGGTATTTCACACCGCATAGGGTCA 
6838 TGGCTGCGCCCCGACACCCGCCAACACCCGCTGACGCGC 
6877 CCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGAC 
6916 AAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGT 
6955 TTTCACCGTCATCACCGAAACGCGCGAGGCAGCAAGGAG 
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Figure 13b (cont'd) 

6994 ATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCCACC 

7033 ATACCCACGCCGAAACAAGCGCTCATGAGCCCGAAGTCG 

7072 CGAGCCCGATCTTCCCCATCGGTGATCTCGGCGATATAG 

7111 GCGCCAGCAACCGCACCTGTGGCGCCGGTCATGCCGGCC 

7150 ACGATGCGTCCGGCGTAGAGGATCTGCTCATGTTTCACA 
7189 GCTTATC 
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Figure 1 4 a 



EcoRV 



Seal 
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Figure 14b 

Nsii 

1 ATCGATGCATAATGTGCCTGTCAAATGGACGAAGCAGGG 
40 ATTCTGCAAACCCTATGCTACTCCGTCAAGCCGTCAATT 
79 GTCTGATTCGTTACCAA TTA 1GA CAA CTT GAG 

293<*** Ser Leu Lys Vai 

111 GGC TAG ATC ATT GAG TTT TIG TTC AGA AGG 
288^ Ala Vai Asp Asn Vai Lys Gl u Gl u Cys Gi y 

141 GGC AGG GAA GTG GGT GGG GGT GGG CGG GGT 
278^Ala Arg Phe Gl u Ser Pro Ser Ala Gl y Thr 

171 GGA TTT TTT AAA TAG GGG GGA GAA ATA GAG 
268^ Gys Lys Lys Phe Vai Arg Ser Phe Tyr Leu 

201 TTG ATC GTG AAA AGG AAG ATT GGG AGG GAG 
258^ Gin Asp Asp Phe Gl y Vai Asn Arg Gl y Vai 

231 GGT GGC GAT AGG CAT CGG GGT GGT GGT CAA 
248^Thr Ala Me Pro Met Arg Thr Thr Ser Leu 

261 AAG GAG CTT CGG CTG GGT GAT AGG TTG GTC 
238^ Leu Leu Lys Ala Gin Ser I 1 e A rg Gl n Asp 

291 GTC GCG CCA GCT TAA GAG GCT AAT GGC TAA 
228^ Giu Arg Trp Ser Leu Vai Ser I I e Gi y Leu 

321 CTG CTG GCG GAA AAG ATG TGA GAG AGG CGA 
218^ Gin Gin Arg Phe Leu His Ser Leu Arg Ser 

351 CGG CGA CAA GGA AAC ATG CTG TGC GAC GCT 
208^ Pro Ser Leu Cys Vai His Gin Ala Vai Ser 
EcoRV 

381 GGC GAT ATC AAA ATT GCT GTC TGC GAG GTG 
198^Ala lie Asp Phe Asn Ser Asp Ala Leu His 
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Figure 14b (cont'd) 



411 ATC GCT GAT GTA CTG ACA AGC CTC GCG TAG 
188^Asp Ser lie Tyr Gin Cys Ala Gl u A rg Val 

441 CCG ATT ATC CAT CGG TGG ATC GAG CGA CTC 
178^Arg Asn Asp Met Pro Pro His Leu Ser Gl u 

471 GTT AAT CGC TTC CAT GCG CCG CAG TAA CAA 
168^Asn II e Al a Gl u Met A rg A rg Leu Leu Leu 

501 TTC CTC AAG CAG ATP TAT CGC CAG CAG CTC 
158^ Gin Glu Leu Leu Asn Me Ala Leu Leu Gl u 

531 CGA ATA GCG CCC TTC CCC TTC CCC GGC GTT 
148^ Ser Tyr A rg Gl y Gl u Gl y Gl n Gl y Al a Asn 

561 AAT GAT TTC CCC AAA CAG GTC GCT GAA ATC 
138^ Me Me Gin Gl y Phe Leu Asp Ser Phe His 

591 CGG CTC GTC CGC TTC ATC CGG GCG AAA GAA 
128^ Pro Gin His Ala Glu Asp Pro A rg Phe Phe 

621 CCC CGT ATT GGC AAA TAT TCA CGG CCA GTT 
118^ Gly Thr Asn Ala Phe Me Ser Pro Trp Asn 

651 AAG CCA TTC ATC CCA GTA GGC GCG CGG ACG 
108^ Leu Trp Glu His Trp Tyr Ala A rg Pro A rg 

681 AAA GTA AAC CCA CTC GTC ATA CCA TTC GCG 
98^ Phe Tyr Val Trp Gin His Tyr Trp Glu A rg 

711 AGC CTC CGG ATC ACG ACC GTA GTC ATC AAT 
88^Ala Glu Pro His A rg Gl y Tyr His His Me 

741 CTC TCC TGG CGG GAA CAG CAA AAT ATC ACC 
78^ Glu Gly Pro Pro Phe Leu Leu Me Asp Gl y 

771 CGG TCG GCA AAC AAA TTC TCG TCC CTC ATT 
68^ Pro Arg Cys Val Phe Gl u A rg Gly Gin Asn 
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Figure 14b (cont'd) 

801 TTT CAC CAC CCC CTG ACC GCG AAT GGT GAG 
58^Lys Val Vai Gl y Gin Gl y A rg Me Thr Leu 

831 ATT GAG AAT ATA ACC TTT CAT TCC CAG CGG 
48^Asn Leu Me Tyr Gl y Lys Met Gl y Leu Pro 

861 TCG GTC GAT AAA AAA ATC GAG ATA ACC GTT 
38^Arg Asp Me Phe Phe Asp Leu Tyr Gl y Asn 

891 GGC CTC AAT CGG CGT TAA ACC CGC CAC CAG 
28^Ala Glu Me Pro Thr Leu Gl y Ala Val Leu 

921 ATG GGC ATT AAA CGA GTA TCC CGG CAG CAG 
18^ His Ala Asn Phe Ser Tyr Gl y Pro Leu Leu 

951 GGG ATC ATT TTG CGC TTC AGC CAT ACTTTTC 
8^Pro Asp Asn Gin Ala Glu Ala Met 

982 ATACTCCCGCCATTCAGAGAAGAAACCAATTGTCCATAT 

1021 TGCATCAGACATTGCCGTCACTGCGTCTTTTACTGGCTC 

1060 TTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGC 

1099 ATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACG 

1138 CGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCAC 

1177 ATTGATTATTTGCACGGCGTCACACTTTGCTATX;CCATA 

BamHI 

1216 GCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGC 

1255 TTTTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTT 

Nhel EcoRI 
1294 TTTTGGGCTAGCAGGAGGAATTCACC ATG ACA CCG 

l^Met Thr Pro 

PstI 

1329 GAC ATT ATC CTG CAG CGT ACC GGG ATC GAT 
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Figure 14b (cont'd) 

4^Asp lie lie Leu Gl n A rg Thr Gl y I I e Asp 

1359 GTG AGA GCT GTC GAA CAG GGG GAT GAT GCG 
14^Val Arg Ala Val Gl u Gl n Gl y Asp Asp Ala 

1389 TGG CAC AAA TTA CGG CTC GGC GTC ATC ACC 
24^Trp His Lys Leu Arg Leu Gly Val Me Thr 

1419 GCT TCA GAA GTT CAC AAC GTC ATA GCA AAA 
34^Ala Ser Gl u Val His Asn Val Me Ala Lys 

1449 CCC CGC TCC GGA AAG AAG TCG CCT GAC ATC 
44^Pro Arg Ser Gly Lys Lys Trp Pro Asp Met 

1479 AAA ATC TCC TAC TTC CAC ACC CTC CTT GCT 
54^Lys Met Ser Tyr Phe His Thr Leu Leu Ala 

1509 GAG GTT TGC ACC GGT GTC GCT CCG GAA GTT 
64^Glu Val Cys Thr Gly Val Ala Pro Gl u Val 

1539 AAC GCT AAA GCA CTC GCC TGG GGA AAA CAG 
74^Asn Ala Lys Ala Leu Ala Trp Gly Lys Gin 

^^^^ EcoRI 
1569 TAC GAG AAC GAC GCC AGA ACC CTC TTP GAA 
84^Tyr Gl u Asn Asp Ala Arg Thr Leu Phe Gl u 

1599 TTC ACT TCC GGC GTC AAT GTT ACT GAA TCC 
94^Phe Thr Ser Gly Val Asn Val Thr Gl u Ser 

1629 CCG ATC ATC TAT CGC GAC GAA AGT ATC CGT 
104^ Pro I le I le Tyr Arg Asp Glu Ser Met Arg 

1659 ACC GCC TGC TCT CCC GAT GGT TPA TCC AGT 
114^ Thr Ala Cys Ser Pro Asp Gl y Leu Cys Ser 

1689 GAC GGC AAC GGC CTT GAA CTC AAA TCC CCG 
124^Asp Gly Asn Gly Leu Glu Leu Lys Cys Pro 
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Figure 14b (cont'd) 

1719 TTT ACC TCC CX3G GAT TIC ATC AAG TTC CGG 
134^Phe Thr Ser A rg Asp Phe Met Lys Phe A rg 

1749 CTC GOT GGT TTC GAG GCC ATA AAG TCA GCT 
144^ Leu Gly Gl y Phe Glu Ala lie Lys Ser Ala 

1779 TAG ATG GCC CAG GTG GAG TAG AGC ATC TCG 
154^Tyr Met Ala Gin Val Gin Tyr Ser Met Trp 

1809 GTG ACG CGA AAA AAT GCC T3G TAG TTT GCC 
164^ Val Thr A rg Lys Asn Ala Trp Tyr Phe Ala 

1839 AAC TAT GAG CCG CGT ATG AAG CGT GAA GGC 
174^Asn Tyr Asp Pro A rg Met Lys A rg Glu Gly 

1869 GTG CAT TAT GTC GTG ATT GAG CGG GAT GAA 
184^ Leu His Tyr Val Val I I e Gl u A rg Asp Glu 

1899 AAG TAG ATG GCG ACT TIT GAG GAG ATC GTG 
194^ Lys Tyr Met Ala Ser Phe Asp Glu lie Val 

1929 CCG GAG TTC ATC GAA AAA ATG GAG GAG GCA 
204^ Pro Glu Phe Me Glu Lys Met Asp Glu Ala 

1959 GTG GCT GAA ATT GGT TTT GTA TTT GGG GAG 
214^ Leu Ala Glu Me Gly Phe Val Phe Gly Glu 

Kpnl 

1989 GAA TGG CGA TAGATCCGGTACCCGAGCACGTGTTGA 
224^GI n Trp A rg • . • 

2025 CAATTAATCATCGGCATAGTATATCGGCATAGTATAATA 
2064 CGACAAGGTGAGGAACTAAACC ATG AGT ACT GCA 

l^Met Ser Thr Ala 

2098 CTC GCA ACG GTG GCT GGG AAG CTC GCT GAA 
S^Leu Ala Thr Leu Ala Gly Lys Leu Ala Glu 
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Figure 14b (cont'd) 

Sail 

2128 CGT GTC GGC ATG GAT TCT GTC GAC CCA CAG 

IS^Arg Val Gl y Met Asp Ser Val Asp Pro Gin 
2158 GAA CTG ATC ACC ACT CTT CGC CAG ACG GCA 

25^Glu Leu lie Thr Thr Leu Arg Gin Thr Ala 
2188 TTP AAA GGT GAT GCC AGC GAT GCG CAG TIC 

35^Phe Lys Gl y Asp Ala Ser Asp Ala Gin Phe 
2218 ATC GCA TPA CTG ATC GTT GCC AAC CAG TAC 

45Mle Ala Leu Leu Me Val Ala Asn Gin Tyr 
2248 GGC err AAT CCG TGG ACG AAA GAA ATT TAC 

55^Gly Leu Asn Pro Trp Thr Lys Gl u lie Tyr 
2278 GCC TTT CCT GAT AAG CAG AAT GGC ATC GTT 

65^Ala Phe Pro Asp Lys Gl n Asn Gl y I I e Va I 

2308 CCG GTG GTG GGC GTT GAT GGC TGG TCC CGC 
75^Pro Val Val Gl y Val Asp Gl y Trp Ser Arg 

2338 ATC ATC AAT GAA AAC CAG CAG TTT GAT GGC 
85Mle Me Asn Gl u Asn Gin Gin Phe Asp Gl y 
2368 ATG GAC TTT GAG CAG GAC AAT GAA TCC TGT 
95^Met Asp Phe Gl u Gin Asp Asn Gl u Ser Cys 
2398 ACA TGC CGG ATT TAC CGC AAG GAC CGT AAT 
105^ Thr Cys Arg Me Tyr Arg Lys Asp Arg Asn 
2428 CAT CCG ATC TGC GTT ACC GAA TX3G ATG GAT 
115^ His Pro Me Cys Val Thr Gl u Trp Met Asp 

2458 GAA TGC CGC CGC GAA CCA ■ TIC AAA ACT CGC 
125^Glu Cys A rg A rg Gl u Pro Phe Lys Thr Arg 

2488 GAA GGC AGA GAA ATC ACG GGG CCG TCG CAG 
135^Glu Gly Arg Glu Me Thr Gly Pro Trp Gin 
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Figure 14b (cont'd) 

2518 TCG CAT CCC AAA CGG ATG TTA CGT CAT AAA 

145^Ser His Pro Lys Arg Met Leu A rg His Lys 

2548 GCC ATG ATT CAG TGT GCC CGT CIG GCC TIC 
155^ Ala IVIet II e Gl n Cys A I a A rg Leu Ala Phe 

2578 GGA TTP GCT GGT ATC TAT GAC AAG GAT GAA 
165>Gly Phe Ala Gly Me Tyr Asp Lys Asp Gl u 

2608 GCC GAG CGC ATP GTC GAA AAT ACT GCA TAC 
175^Ala Glu Arg Me Val Gl u Asn Thr Ala Tyr 
PstI 

2638 ACT GCA GAA CGT CAG CCG GAA CGC GAC ATC 
185^ Thr Ala Glu Arg Gin Pro Glu Arg Asp I le 

2668 ACT CCG GTT AAC GAT GAA ACC ATG CAG GAG 
195^ Thr Pro Val Asn Asp Glu Thr Met Gin Glu 

2698 ATT AAC ACT CTG CTC ATC GCC CTC GAT AAA 
205^ lie Asn Thr Leu Leu Me Ala Leu Asp Lys 

2728 ACA TGG GAT GAC GAC TTA TTG CCG CTC TCT 
215^ Thr Trp Asp Asp Asp Leu Leu Pro Leu Cys 

2758 TCC CAG ATA TTP. CGC CGC GAC ATT CGT GCA 
225^Ser Gin lie Phe Arg Arg Asp Me Arg Ala 

2788 TCG TCA GAA CTC ACA CAG GCC GAA GCA GTA 
235^Ser Ser Glu Leu Thr Gl n Al a Gl u Al a Va I 

2818 AAA GCT CTT GGA TTC CTC AAA CAG AAA GCC 
245^ Lys Ala Leu Gly Phe Leu. Lys Gin Lys Ala 

Bglll Xhol 

2848 GCA GAG CAG AAG GTC GCA GCA TAGATCTCGAG 
255^Ala Glu Gin Lys Val Ala Ala ••• 
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Figure 14b (cont'd) 

Hindlli 

2880 AAGCTTCCTGCTGAACATCAAAGGCAAGAAAACATCTGT 

2919 TGTCAAAGACAGCATCCTTGAACAAGGACAATTAACAGT 

2958 TAACAAATAAAAACGCAAAAGAAAATGCCGATATCCTAT 

2997 TCGCATTTTCTTTTATTTCTTATCA^ 

Xhol 

3036 CCCATACCTCGAGCTTCACGCTGCCGCAAGCACTCAGGG 
3075 CGCAAGGGCTGCTAAAAGGAAGCGGAACACGTAGAAAGC 
3114 CAGTCCGCAGAAACGGTGCTGACCCCGGATGAATGTCAG 
3153 CTACTGGGCTATCTGGACAAGGGAAAACGCAAGCGCAAA 
3192 GAGAAAGCAGGTAGCTTGCAGTGGGCTTACATGGCGATA 
3231 GCTAGACTGGGCGGTTTTATGGACAGCAAGCGAACCGGA 
Pvull 

3270 ATTGCCAGCTGGGGCGCCCTCTGGTAAGGTTGGGAAGCC 

3309 CI^SCAAAGTAAACTGGATGGCTTTCTTGCCGC 

Bglll 

3348 CTGATGGCGCAGGGGATCAAGATCTGATCAAGAGACAGG 

3387 ATGAGGATCGTTTCGC ATG GAT ATT AAT ACT 

l^Met Asp I I e Asn Thr 

3418 GAA ACT GAG ATC AAG CAA AAG CAT TCA CTA 
6^Glu Thr Gl u Me Lys Gin Lys His Ser Leu 

3448 ACC CCC TTT CCT GTT TIC CTA ATC AGC CCG 
16^Thr Pro Phe Pro Val Phe Leu lie Ser Pro 

3478 GCA TTT CGC GGG CGA TAT TTT CAC AGC TAT 
26^Ala Phe A rg Gl y A rg Tyr Phe His Ser Tyr 

3508 TTC AGG AGT TCA GCC ATG AAC GCT TAT TAC 
36^Phe Arg Ser Ser Ala Met Asn Ala Tyr Tyr 
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Figure 14b (cont'd) 

3538 ATT CAG GAT CGT CTT GAG GCT GAG AGO TGG 
46^ I I e Gl n Asp A rg Leu GI u Al a Gi n Ser Trp 

3568 GCG CGT CAC TAG CAG CAG CTC GCC CGT GAA 
56^Ala Arg His Tyr Gin Gin Leu Ala Arg Gl u 

3598 GAG AAA GAG GCA GAA CTG GCA GAG GAG ATG 
66^ Glu Lys Glu Ala Gl u Leu Ala Asp Asp Met 

3628 GAA AAA GGC CTG CCC CAG CAC CTG TTT GAA 
76^Glu Lys Gl y Leu Pro Gin His Leu Phe Glu 

3658 TCG CTA TGC ATC GAT CAT TTG CAA CGC CAC 
86^Ser Leu Cys Me Asp His Leu Gin Arg His 

3688 GGG GCC AGC AAA AAA TCC ATT ACC CGT GCG 
96^Gly Ala Ser Lys Lys Ser Me Thr Arg Ala 

3718 TTT GAT GAG GAT GTT GAG TTT CAG GAG CGC 
106^ Phe Asp Asp Asp Val Glu Phe Gin Glu Arg 

3748 ATG GCA GAA CAC ATC CGG TAG ATG GTT GAA 
116^Met Ala Glu His Me Arg Tyr Met Val Glu 

3778 ACC ATT GCT CAC , CAC CAG GTT GAT ATT GAT 
126^ Thr II e Al a Hi s HI s Gl n Val Asp I I e Asp 

Hindlll 

3808 TCA GAG GTA TAA AACGAGTAGA AGC TTG GCT 
136^ Ser Gl u Val • • • 

3839 GTT TTG GCG GAT GAG AGA AGA TTT TCA GCC 

3869 TGA TACAGATTAAATCAGAACGCAGAAGCGGTCTGATA 

3907 AAACAGAATTTGCCTGGCGGCAGTAGCGCGGTGGTCCCA 

3946 CCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGC 

3985 GCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGG 
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Figure 14b (cont'd) 

4024 AACTGCCAGGCATCAAATAAAACGAAAGGCTCAGTCGAA 

4063 AGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGA^ 

4102 CGCTCTCCTGAGTAGGACAAATCCGCCGGGAGCGGATTT 

4141 GAACGTTGCGAAGCAACGGCCCGGAGGGTGGCGGGCAGG 

4180 ACGCCCGCCATAAACTGCCAGGCATCAAATTAAGCAGAA 

42 19 GGCCATCCTGACGGATGGCCTTTTTGCGTTTCTACAAAC 

4258 TCTTTTGTTTATTTTTCTAAATACATTC^^ 

4297 CGCTCATGAGACAATAACCCTGATAAATGCTTCAATAAT 

4336 ATTGAAAAAGGAAGAGT ATG AGT ATT CAA CAT 

l^Met Ser Me Gl n Hi s 



4368 TIC 


CGT 


GTC 


GCC 




ATT 


CCC 




TTT 


GCG 


6^Phe 


Arg 


Val 


Al a 


Leu 


1 1 e 


Pro 


Phe 


Phe 


Al a 


4398 GCA 




TGC 


CTT 


CCT 


GTT 


'i'i'i' 


GCT 


CAC 


CCA 


16^Ala 


Phe 


Cys 


Leu 


Pro 


Val 


Phe 


Al a 


Hi s 


Pro 


4428 GAA 


ACG 


CTG 


GTG 


AAA 


GTA 


AAA 


GAT 


GCT 


GAA 


26^Glu 


Thr 


Leu 


Val 


Lys 


Val 


Lys 


Asp 


Al a 


Gl u 


4458 GAT 


CAG 


TTG 


GGT 


GCA 


CGA 


GTG 


GGT 


TAC 


ATC 


36^Asp 


Gl n 


Leu 


Gl y 


Al a 


A rg 


Val 


Gly 


Tyr 


1 1 e 


4488 GAA 


CTG 


GAT 


CTC 


AAC 


AGC 


GGT 


AAG 


ATC 


CTT 


46^GI u 


Leu 


Asp 


Leu 


Asn 


Ser 


Gl y 


Lys 


1 1 e 


Leu 


4518 GAG 


AGT 


TTT 


CGC 


ccc 


GAA 


GAA 


CGT 


TTT 


CCA 


56^GI u 


Ser 


Phe 


A rg 


Pro 


Gl u 


Gl u 


A rg 


Phe 


Pro 


4548 ATG 


ATG 


AGC 


ACT 


'i'i'i' 


AAA 


GTP 


CTG 


CTA 


TGT 


66^Met 


Met 


Ser 


Thr 


Phe 


Lys 


Val 


Leu 


Leu 


Cys 
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Figure 14b (cont'd) 

4578 GGC GCG GTA TTA 
76^Gly Ala Val Leu 

4608 CAA GAG CAA CTC 
86> Gl n Gl u Gl n Leu 

4638 TCT GAG AAT GAG 
96^ Ser Gl n Asn Asp 

4668 GTC ACA GAA AAG 
106^ Val Thr Gl u Lys 

4698 ACA GTA AGA GAA 
116^Thr Val A rg Gl u 

4728 ACC ATG AGT GAT 
126^Thr Met Ser Asp 

4758 CTT CTG ACA ACG 
136^Leu Leu Thr Thr 

4788 CTA ACC GCT TTT 
146^ Leu Thr Al a Phe 

4818 CAT GTA ACT CGC 
156^His Val Thr A rg 

4848 GAG CTG AAT GAA 
166^ Gl u Leu Asn Gl u 

4878 CGT GAC ACC ACG 
176^ A rg Asp Thr Thr 

4908 ACA ACG TIG CGC 
186^ Thr Thr Leu A rg 

4938 CTA CTT ACT CTA 
196^ Leu Leu Thr Leu 



TCC 


CGT 


GTT 


GAC 


GCC 


GGG 


Ser 


A rg 


Val 


Asp 


Al a 


Gl y 


GGT 


CGC 


CGC 


ATA 


CAC 


TAT 


Gly 


A rg 


A rg 


1 le 


Hi s 


Tyr 






Seal 






1 l\j 


\j1 1 


GAG 


TAG 




PPA 


Leu 


Val 


Gl u 


Tyr 


Ser 


Pro 






ACG 


GAT 






Hi s 


Leu 


Thr 


Asp 


Gl y 


Met 






AGT 


GCT 




ATA 


Leu 


Cys 


Ser 


Al a 


Ala 


1 1 e 






GCG 


GCC 




TTA 

X Xr^ 


Asn 


Thr 


Ala 


Al a 


Asn 


Leu 






GGA 


CCG 


AAG 


GAG 


1 1 e 


Gl y 

J 


Gl y 


P ro 


Lys 


Gl u 






AAC 


ATG 




HAT 


Leu 


Hi s 


Asn 


Met 


Gl y 


Asp 






CGT 


TGG 


GAA 


CCG 


Leu 


Asp 


A rg 


Trp 


Gl u 


Pro 


arc 


ATA 


CCA 


AAC 


GAC 


GAG 




i 1 P 


Pro 


Asn 


ASD 


Gl u 


ATG 


CCT 


GTA 


GCA 


ATG 


GCA 


Met 


Pro 


Val 


Al a 


Met 


Al a 


AAA 


CTA 


TTA 


ACT 


GGC 


GAA 


Lys 


Leu 


Leu 


Thr 


Gly 


Gl u 


GCT 


TCC 


CGG 


CAA 


CAA 


TTA 


Al a 


Ser 


A rg 


Gl n 


Gl n 


Leu 
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Figure 14b (cont'd) 



4968 ATA 


GAC 


TGG 


ATG 


GAG 


GCG 


GAT 


AAA 


GTT 


GCA 


206^ 1 1 e 


Asp 


Trp 


Met 


Gl u 


Ala 


Asp 


Lys 


Val 


Al a 


4998 GGA 


CCA 


CTT 


CTG 


CGC 


TCG 


GCC 


CTT 


CCG 


GCT 


216^Gly 


Pro 


Leu 


Leu 


A rg 


Ser 


Ala 


Leu 


Pro 


Al a 


5028 GGC 


TGG 


Ti'i' 


ATT 


GCT 


GAT 


AAA 


TCT 


GGA 


GCC 


226^Gly 


Trp 


Phe 


lie 


A! a 


Asp 


Lys 


Ser 


Gly Ala 


5058 GGT 


GAG 


CGT 


GGG 


TCT 


CGC 


GGT 


ATC 


ATT 


GCA 


236^Gly 


Gl u 


A rg 


Gl y 


Ser 


A rg 


Gly 


1 1 e 


1 1 e 


Al a 


5088 GCA 


CTG 


GGG 


CCA 


GAT 


GGT 


AAG 


CCC 


TCC 


CGT 


246^AI a 


Leu 


Gly 


Pro 


Asp 


Gl y 


Lys 


Pro 


Ser 


A rg 


5118 ATC 


GTA 


GTT 


ATC 


TAC 


ACG 


ACG 


GGG 


AGT 


CAG 


256^ 1 le 


Val 


Val 


1 1 e 


Tyr 


Thr 


Thr 


Gly 


Ser 


Gl n 


5148 GCA 


ACT 


ATG 


GAT 


GAA 


CGA 


AAT 


AGA 


CAG 


ATC 


266^AI a 


Thr 


Met 


Asp 


Gl u 


A rg Asn A rg 


Gl n 


1 1 e 


5178 GCT 


GAG 


ATA 


GGT 


GCC 


TCA 


CTG 


ATT 


AAG 


CAT 


276^AI a 


Gl u 


1 1 e 


Gl y 


Ala 


Ser 


Leu 


1 1 e 


Lys 


Hi s 


5208 TGG 


TAA 


CTGTCAGACCAAGTTTACTCATATATACTTT 


286^Trp 


• • • 



















5245 AGATTGATTTACGCGCCCTGTAGCGGCGCATTAAGCGCG 

5284 GCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTT 

5323 GCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCT 

53 62 TCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCT 

5401 CTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCT 

5440 TTACGGCACCTCGACCCCAAAAAACTTGATTTGGGTGAT 

5479 GGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTT 



SUBSTTTUTE SHEET (RULE 26) 



wo 99/29837 



PCT/EP98/07945 



64/65 

Figure 14b (cont'd) 

5518 CGCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGA 
5557 CTCTTGTTCCAAACTTGAACAACACTCAACCCTATCTCG 
5596 GGCTATTCTTTTGATTTATAAGGGATTTTGCCGAT^^ 
5635 GCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTT 
5674 AACGCGAATTTTAACAAAATATTAACGTTTACAATTTAA 
5713 AAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGAC 
5752 CAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTC 
5791 AGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCC 
5830 TTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAA^ 
5869 ACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGA 
5908 GCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAG 
5947 AGCGCAGATACCAAATACTGTCCTTCTAGTGTAGCCGTA 
5986 GTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTAC 
6025 ATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGC 
6064 CAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAG 
6103 ACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAAC 
6142 GGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGAC 
6181 CTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGA 
6220 AAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTA 
6259 TCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAG 
6298 GGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCC 
6337 TGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTT 
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Figure 14b (cont'd) 

6376 GTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGC 
6415 CAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTC 
6454 GCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCT^ 
6493 TTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGA 
6532 TACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTC 
6571 AGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTT 
6610 TCTCCTTACGCATCTGTGCGGTATTTCACACCGCATAGG 
6649 GTCATGGCTGCGCCCCGACACCCGCCAACACCCGCTGAC 
6688 GCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTAC 
6727 AGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAG 
6766 AGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCAA 
6805 GGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGC 
6844 CACCATACCCACGCCGAAACAAGCGCTCATGAGCCCGAA 
6883 GTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGAT 
6922 ATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTGATGCC 
6961 GGCCACGATGCGTCCGGCGTAGAGGATCTGCTCATGTTT 
7000 GACAGCTTATC 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: 

(EMBL) NAME: European Molecular Biology Laboratory 

(B) STREET: Meyerhof strasse 1 

(C) CITY: Heidelberg 

(E) COUNTRY: DE 

(F) POSTAL CODE (ZIP) : D-69117 

(ii) TITLE OF INVENTION: Novel DNA Cloning Method 
(iii) NUMBER OF SEQUENCES: 14 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE; Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(EPO) 



(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: EP 97121562.2 

(B) FILING DATE: 05-DEC-1997 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: EP 98118756.0 

(B) FILING DATE: 05-OCT-1998 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6150 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: pBAD24-recET 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: complement (96.. 974) 

(D) OTHER INFORMATION: /product = "araC" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1320. .2162 

(D) OTHER INFORMATION: /product = "t-recE" 



wo 99/29837 



PCT/EP98/07945 



(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 2 155. .2972 

(D) OTHER INFORMATION: /product = "recT" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 3493 . .4353 

(D) OTHER INFORMATION: /product = "bla" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATCGATGCAT AATGTGCCTG TCAAATGGAC GAAGCAGGGA TTCTGCAAAC CCTATGCTAC 60 

TCCGTCAAGC CGTCAATTGT CTGATTCGTT ACCAATTATG ACAACTTGAC GGCTACATCA 12 0 

TTCACTTTTT CTTCACAACC GGCACGGAAC TCGCTCGGGC TGGCCCCGGT GCATTTTTTA 180 

AATACCCGCG AGAAATAGAG TTGATCGTCA AAACCAACAT TGCGACCGAC GGTGGCGATA 240 

GGCATCCGGG TGGTGCTCAA AAGCAGCTTC GCCTGGCTGA TACGTTGGTC CTCGCGCCAG 3 00 

CTTAAGACGC TAATCCCTAA CTGCTGGCGG AAAAGATGTG ACAGACGCGA CGGCGACAAG 3 60 

CAAACATGCT GTGCGACGCT GGCGATATCA AAATTGCTGT CTGCCAGGTG ATCGCTGATG 420 

TACTGACAAG CCTCGCGTAC CCGATTATCC ATCGGTGGAT GGAGCGACTC GTTAATCGCT 4 80 

TCCATGCGCC GCAGTAACAA TTGCTCAAGC AGATTTATCG CCAGCAGCTC CGAATAGCGC 54 0 

CCTTCCCCTT GCCCGGCGTT AATGATTTGC CCAAACAGGT CGCTGAAATG CGGCTGGTGC 600 

GCTTCATCCG GGCGAAAGAA CCCCGTATTG GCAAATATTG ACGGCCAGTT AAGCCATTCA 660 

TGCCAGTAGG CGCGCGGACG AAAGTAAACC CACTGGTGAT ACCATTCGCG AGCCTCCGGA 72 0 

TGACGACCGT AGTGATGAAT CTCTCCTGGC GGGAACAGCA AAATATCACC CGGTCGGCAA 78 0 

ACAAATTCTC GTCCCTGATT TTTCACCACC CCCTGACCGC GAATGGTGAG ATTGAGAATA 84 0 

TAACCTTTCA TTCCCAGCGG TCGGTCGATA AAAAAATCGA GATAACCGTT GGCCTCAATC 900 

GGCGTTAAAC CCGCCACCAG ATGGGCATTA AACGAGTATC CCGGCAGCAG GGGATCATTT 960 

TGCGCTTCAG CCATACTTTT CATACTCCCG CCATTCAGAG AAGAAACCAA TTGTCCATAT 1020 

TGCATCAGAC ATTGCCGTCA CTGCGTCTTT TACTGGCTCT TCTCGCTAAC CAAACCGGTA 1080 

ACCCCGCTTA TTAAAAGCAT TCTGTAACAA AGCGGGACCA AAGCCATGAC AAAAACGCGT 1140 

AACAAAAGTG TCTATAATCA CGGCAGAAAA GTCCACATTG ATTATTTGCA CGGCGTCACA 1200 

CTTTGCTATG CCATAGCATT TTTATCCATA AGATTAGCGG ATCCTACCTG ACGCTTTTTA 1260 

TCGCAACTCT CTACTGTTTC TCCATACCCG TTTTTTTGGG CTAGCAGGAG GAATTCACCA 1320 

TGGATCCCGT AATCGTAGAA GACATAGAGC CAGGTATTTA TTACGGAATT TCGAATGAGA 1380 

ATTACCACGC GGGTCCCGGT ATCAGTAAGT CTCAGCTCGA TGACATTGCT GATACTCCGG 144 0 

CACTATATTT GTGGCGTAAA AATGCCCCCG TGGACACCAC AAAGACAAAA ACGCTCGATT 1500 

TAGGAACTGC TTTCCACTGC CGGGTACTTG AACCGGAAGA ATTCAGTAAC CGCTTTATCG 1560 
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TAGCACCTGA ATTTAACCGC CGTACAAACG CCGGAAAAGA AGAAGAGAAA GCGTTTCTGA 1620 

TGGAATGCGC AAGCACAGGA AAAACGGTTA TCACTGCGGA AGAAGGCCGG AAAATTGAAC 1680 

TCATGTATCA AAGCGTTATG GCTTTGCCGC TGGGGCAATG GCTTGTTGAA AGCGCCGGAC 1740 

ACGCTGAATC ATCAATTTAC TGGGAAGATC CTGAAACAGG AATTTTGTGT CGGTGCCGTC 1800 

CGGACAAAAT TATCCCTGAA TTTCACTGGA TCATGGACGT GAAAACTACG GCGGATATTC 1860 

AACGATTCAA AACCGCTTAT TACGACTACC GCTATCACGT TCAGGATGCA TTCTACAGTG 1920 

ACGGTTATGA AGCACAGTTT GGAGTGCAGC CAACTTTCGT TTTTCTGGTT GCCAGCACAA 1980 

CTATTGAATG CGGACGTTAT CCGGTTGAAA TTTTCATGAT GGGCGAAGAA GCAAAACTGG 2040 

CAGGTCAACA GGAATATCAC CGCAATCTGC GAACCCTGTC TGACTGCCTG AATACCGATG 2100 

AATGGCCAGC TATTAAGACA TTATCACTGC CCCGCTGGGC TAAGGAATAT GCAAATGACT 2160 

AAGCAACCAC CAATCGCAAA AGCCGATCTG CAAAAAACTC AGGGAAACCG TGCACCAGCA 2220 

GCAGTTAAAA ATAGCGACGT GATTAGTTTT ATTAACCAGC CATCAATGAA AGAGCAACTG 2280 

GCAGCAGCTC TTCCACGCCA TATGACGGCT GAACGTATGA TCCGTATCGC CACCACAGAA 234 0 

ATTCGTAAAG TTCCGGCGTT AGGAAACTGT GACACTATGA GTTTTGTCAG TGCGATCGTA 24 00 

CAGTGTTCAC AGCTCGGACT TGAGCCAGGT AGCGCCCTCG GTCATGCATA TTTACTGCCT 2460 

TTTGGTAATA AAAACGAAAA GAGCGGTAAA AAGAACGTTC AGCTAATCAT TGGCTATCGC 2520 

GGCATGATTG ATCTGGCTCG CCGTTCTGGT CAAATCGCCA GCCTGTCAGC CCGTGTTGTC 2580 

CGTGAAGGTG ACGAGTTTAG CTTCGAATTT GGCCTTGATG AAAAGTTAAT ACACCGCCCG 264 0 

GGAGAAAACG AAGATGCCCC GGTTACCCAC GTCTATGCTG TCGCAAGACT GAAAGACGGA 2700 

GGTACTCAGT TTGAAGTTAT GACGCGCAAA CAGATTGAGC TGGTGCGCAG CCTGAGTAAA 2760 

GCTGGTAATA ACGGGCCGTG GGTAACTCAC TGGGAAGAAA TGGCAAAGAA AACGGCTATT 2820 

CGTCGCCTGT TCAAATATTT GCCCGTATCA ATTGAGATCC AGCGTGCAGT ATCAATGGAT 2880 

GAAAAGGAAC CACTGACAAT CGATCCTGCA GATTCCTCTG TATTAACCGG GGAATACAGT 2940 

GTAATCGATA ATTCAGAGGA ATAGATCTAA GCTTGGCTGT TTTGGCGGAT GAGAGAAGAT 3000 

TTTCAGCCTG ATACAGATTA AATCAGAACG CAGAAGCGGT CTGATAAAAC AGAATTTGCC 3 060 

TGGCGGCAGT AGCGCGGTGG TCCCACCTGA CCCCATGCCG AACTCAGAAG TGAAACGCCG 3120 

TAGCGCCGAT GGTAGTGTGG GGTCTCCCCA TGCGAGAGTA GGGAACTGCC AGGCATCAAA 3180 

TAAAACGAAA GGCTCAGTCG AAAGACTGGG CCTTTCGTTT TATCTGTTGT TTGTCGGTGA 3240 

ACGCTCTCCT GAGTAGGACA AATCCGCCGG GAGCGGATTT GAACGTTGCG AAGCAACGGC 3300 

CCGGAGGGTG GCGGGCAGGA CGCCCGCCAT AAACTGCCAG GCATCAAATT AAGCAGAAGG 3360 

CCATCCTGAC GGATGGCCTT TTTGCGTTTC TACAAACTCT TTTGTTTATT TTTCTAAATA 3420 

CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA ATAATATTGA 3480 

AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC TTATTCCCTT TTTTGCGGCA 354 0 

TTTTGCCTTC CTGTTTTTGC TCACCCAGAA ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT 3600 
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CAGTTGGGTG CACGAGTGGG TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG 3660 

AGTTTTCGCC CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC 372 0 

GCGGTATTAT CCCGTGTTGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT ACACTATTCT 3780 

CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC ATCTTACGGA TGGCATGACA 384 0 

GTAAGAGAAT TATGCAGTGC TGCCATAACC ATGAGTGATA ACACTGCGGC CAACTTACTT 3900 

CTGACAACGA TCGGAGGACC GAAGGAGCTA ACCGCTTTTT TGCACAACAT GGGGGATCAT 3 960 

GTAACTCGCC TTGATCGTTG GGAACCGGAG CTGAATGAAG CCATACCAAA CGACGAGCGT 4020 

GACACCACGA TGCCTGTAGC AATGGCAACA ACGTTGCGCA AACTATTAAC TGGCGAACTA 4 080 

CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG AGGCGGATAA AGTTGCAGGA 4140 

CCACTTCTGC GCTCGGCCCT TCCGGCTGGC TGGTTTATTG CTGATAAATC TGGAGCCGGT 4200 

GAGCGTGGGT CTCGCGGTAT CATTGCAGCA CTGGGGCCAG ATGGTAAGCC CTCCCGTATC 4260 

GTAGTTATCT ACACGACGGG GAGTCAGGCA ACTATGGATG AACGAAATAG ACAGATCGCT 4 320 

GAGATAGGTG CCTCACTGAT TAAGCATTGG TAACTGTCAG ACCAAGTTTA CTCATATATA 43 80 

CTTTAGATTG ATTTACGCGC CCTGTAGCGG CGCATTAAGC GCGGCGGGTG TGGTGGTTAC 444 0 

GCGCAGCGTG ACCGCTACAC TTGCCAGCGC CCTAGCGCCC GCTCCTTTCG CTTTCTTCCC 4500 

TTCCTTTCTC GCCACGTTCG CCGGCTTTCC CCGTCAAGCT CTAAATCGGG GGCTCCCTTT 4560 

AGGGTTCCGA TTTAGTGCTT TACGGCACCT CGACCCCAAA AAACTTGATT TGGGTGATGG 4 620 

TTCACGTAGT GGGCCATCGC CCTGATAGAC GGTTTTTCGC CCTTTGACGT TGGAGTCCAC 4680 

GTTCTTTAAT AGTGGACTCT TGTTCCAAAC TTGAACAACA CTCAACCCTA TCTCGGGCTA 474 0 

TTCTTTTGAT TTATAAGGGA TTTTGCCGAT TTCGGCCTAT TGGTTAAAAA ATGAGCTGAT 4 800 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAAAGGATCT 4 860 

AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG TTTTCGTTCC 4 920 

ACTGAGCGTC AGACCCCGTA GAAAAGATCA AAGGATCTTC TTGAGATCCT TTTTTTCTGC 4980 

GCGTAATCTG CTGCTTGCAA ACAAAAAAAC CACCGCTACC AGCGGTGGTT TGTTTGCCGG 504 0 

ATCAAGAGCT ACCAACTCTT TTTCCGAAGG TAACTGGCTT CAGCAGAGCG CAGATACCAA 5100 

ATACTGTCCT TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 5160 

CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC GATAAGTCGT 5220 

GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA GGCGCAGCGG TCGGGCTGAA 5280 

CGGGGGGTTC GTGCACACAG CCCAGCTTGG AGCGAACGAC CTACACCGAA CTGAGATACC 5340 

TACAGCGTGA GCTATGAGAA AGCGCCACGC TTCCCGAAGG GAGAAAGGCG GACAGGTATC 5400 

CGGTAAGCGG CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 5460 

GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA TTTTTGTGAT 5520 

GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA CGCGGCCTTT TTACGGTTCC 558 0 

TGGCCTTTTG CTGGCCTTTT GCTCACATGT TCTTTCCTGC GTTATCCCCT GATTCTGTGG 564 0 
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ATAACCGTAT TACCGCCTTT GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC 5700 

GCAGCGAGTC AGTGAGCGAG GAAGCGGAAG AGCGCCTGAT GCGGTATTTT CTCCTTACGC 5760 

ATCTGTGCGG TATTTCACAC CGCATAGGGT CATGGCTGCG CCCCGACACC CGCCAACACC 5820 

CGCTGACGCG CCCTGACGGG CTTGTCTGCT CCCGGCATCC GCTTACAGAC AAGCTGTGAC 5880 

CGTCTCCGGG AGCTGCATGT GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGCGAGGCA 5940 

GCAAGGAGAT GGCGCCCAAC AGTCCCCCGG CCACGGGGCC TGCCACCATA CCCACGCCGA 6000 

AACAAGCGCT CATGAGCCCG AAGTGGCGAG CCCGATCTTC CCCATCGGTG ATGTCGGCGA 6060 

TATAGGCGCC AGCAACCGCA CCTGTGGCGC CGGTGATGCC GGCCACGATG CGTCCGGCGT 6120 

AGAGGATCTG CTCATGTTTG ACAGCTTATC 6150 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 843 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: t-recE 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!, ,84 3 

(D) OTHER INFORMATION: /product = "t-recE" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ATG GAT CCC GTA ATC GTA GAA GAG ATA GAG CCA GGT ATT TAT TAC GGA 4 8 

Met Asp Pro Val lie Val Glu Asp He Glu Pro Gly He Tyr Tyr Gly 
^5 10 15 

ATT TCG AAT GAG AAT TAC CAC GCG GGT CCC GGT ATC AGT AAG TCT CAG 96 
He Ser Asn Glu Asn Tyr His Ala Gly Pro Gly He Ser Lys Ser Gin 
20 25 30 

CTC GAT GAC ATT GCT GAT ACT CCG GCA CTA TAT TTG TGG CGT AAA AAT 144 
Leu Asp Asp He Ala Asp Thr Pro Ala Leu Tyr Leu Trp Arg Lys Asn 
35 40 45 

GCC CCC GTG GAC ACC ACA AAG ACA AAA ACG CTC GAT TTA GGA ACT GCT 192 
Ala Pro Val Asp Thr Thr Lys Thr Lys Thr Leu Asp Leu Gly Thr Ala 
50 55 60 

TTC CAC TGC CGG GTA CTT GAA CCG GAA GAA TTC AGT AAC CGC TTT ATC 24 0 

Phe His Cys Arg Val Leu Glu Pro Glu Glu Phe Ser Asn Arg Phe He 
^5 70 75 80 

GTA GCA CCT GAA TTT AAC CGC CGT ACA AAC GCC GGA AAA GAA GAA GAG 288 
Val Ala Pro Glu Phe Asn Arg Arg Thr Asn Ala Gly Lys Glu Glu Glu 
85 90 95 

AAA GCG TTT CiG ATG GAA TGC GCA AGC ACA GGA AAA ACG GTT ATC ACT 33 6 

Lys Ala Phe Leu Met Glu Cys Ala Ser Thr Gly Lys Thr Val He Thr 
100 105 110 
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GCG GAA GAA GGC CGG AAA ATT GAA CTC ATG TAT CAA AGC GTT ATG GCT 3 84 

Ala Glu Glu Gly Arg Lys He Glu Leu Met Tyr Gin Ser Val Met Ala 
115 120 125 

TTG CCG CTG GGG CAA TGG CTT GTT GAA AGC GCC GGA CAC GCT GAA TCA 432 
Leu Pro Leu Gly Gin Trp Leu Val Glu Ser Ala Gly His Ala Glu Ser 
130 135 140 

TCA ATT TAC TGG GAA GAT CCT GAA ACA GGA ATT TTG TGT CGG TGC CGT 4 80 

Ser He Tyr Trp Glu Asp Pro Glu Thr Gly He Leu Cys Arg Cys Arg 
145 150 155 160 

CCG GAC AAA ATT ATC CCT GAA TTT CAC TGG ATC ATG GAC GTG AAA ACT 528 
Pro Asp Lys He He Pro Glu Phe His Trp He Met Asp Val Lys Thr 
165 170 175 

ACG GCG GAT ATT CAA CGA TTC AAA ACC GCT TAT TAC GAC TAC CGC TAT 576 
Thr Ala Asp He Gin Arg Phe Lys Thr Ala Tyr Tyr Asp Tyr Arg Tyr 
180 185 190 

CAC GTT CAG GAT GCA TTC TAC AGT GAC GGT TAT GAA GCA CAG TTT GGA 624 
His Val Gin Asp Ala Phe Tyr . Ser Asp Gly Tyr Glu Ala Gin Phe Gly 
195 200 205 

GTG CAG CCA ACT TTC GTT TTT CTG GTT GCC AGC ACA ACT ATT GAA TGC 672 
Val Gin Pro Thr Phe Val Phe Leu Val Ala Ser Thr Thr He Glu Cys 
210 215 220 

GGA CGT TAT CCG GTT GAA ATT TTC ATG ATG GGC GAA GAA GCA AAA CTG 720 
Gly Arg Tyr Pro Val Glu He Phe Met Met Gly Glu Glu Ala Lys Leu 
225 230 235 240 

GCA GGT CAA CAG GAA TAT CAC CGC AAT CTG CGA ACC CTG TCT GAC TGC 768 
Ala Gly Gin Gin Glu Tyr His Arg Asn Leu Arg Thr Leu Ser Asp Cys 
245 250 255 

CTG AAT ACC GAT GAA TGG CCA GCT ATT AAG ACA TTA TCA CTG CCC CGC 816 
Leu Asn Thr Asp Glu Trp Pro Ala He Lys Thr Leu Ser Leu Pro Arg 
260 265 270 

TGG GCT AAG GAA TAT GCA AAT GAC TAA 843 
Trp Ala Lys Glu Tyr Ala Asn Asp * 
275 280 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 81 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Asp Pro Val lie Val Glu Asp lie Glu Pro Gly lie Tyr Tyr Gly 
15 10 15 

lie Ser Asn Glu Asn Tyr His Ala Gly Pro Gly lie Ser Lys Ser Gin 
20 25 30 

Leu Asp Asp lie Ala Asp Thr Pro Ala Leu Tyr Leu Trp Arg Lys Asn 
35 40 45 

Ala Pro Val Asp Thr Thr Lys Thr Lys Thr Leu Asp Leu Gly Thr Ala 
50 55 60 
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Phe His Cys Arg Val Leu Glu Pro Glu Glu Phe Ser Asn Arg Phe lie 
S5 70 75 80 

Val Ala Pro Glu Phe Asn Arg Arg Thr Asn Ala Gly Lys Glu Glu Glu 
85 90 95 

Lys Ala Phe Leu Met Glu Cys Ala Ser Thr Gly Lys Thr Val He Thr 
100 105 110 

Ala Glu Glu Gly Arg Lys He Glu Leu Met Tyr Gin Ser Val Met Ala 
115 120 125 

Leu Pro Leu Gly Gin Trp Leu Val Glu Ser Ala Gly His Ala Glu Ser 
130 135 140 

Ser He Tyr Trp Glu Asp Pro Glu Thr Gly He Leu Cys Arg Cys Arg 
145 150 155 160 

Pro Asp Lys He He Pro Glu Phe His Trp He Met Asp Val Lys Thr 
165 170 175 

Thr Ala Asp He Gin Arg Phe Lys Thr Ala Tyr Tyr Asp Tyr Arg Tyr 
180 185 190 

His Val Gin Asp Ala Phe Tyr Ser Asp Gly Tyr Glu Ala Gin Phe Gly 
195 200 205 

Val Gin Pro Thr Phe Val Phe Leu Val Ala Ser Thr Thr He Glu Cys 
210 215 220 

Gly Arg Tyr Pro Val Glu He Phe Met Met Gly Glu Glu Ala Lys Leu 
225 230 235 240 

Ala Gly Gin Gin Glu Tyr His Arg Asn Leu Arg Thr Leu Ser Asp Cys 
245 250 255 

Leu Asn Thr Asp Glu Trp Pro Ala He Lys Thr Leu Ser Leu Pro Arg 
260 265 270 

Trp Ala Lys Glu Tyr Ala Asn Asp * 
275 280 

(2) INFORMATION FOR SEQ ID NO : 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 810 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: recT 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .810 

(D) OTHER INFORMATION: /product = "recT" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 

ATG ACT AAG CAA CCA CCA ATC GCA AAA GCC GAT CTG CAA AAA ACT CAG 
Met Thr Lys Gin Pro Pro He Ala Lys Ala Asp Leu Gin Lys Thr Gin 
285 290 295 
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GGA AAC CGT GCA CCA OCA GCA GTT AAA AAT AGC GAC GTG ATT ACT TTT 96 
Gly Asn Arg Ala Pro Ala Ala Val Lys Asn Ser Asp Val lie Ser Phe 
300 305 310 

ATT AAC CAG CCA TCA ATG AAA GAG CAA CTG GCA GCA GCT CTT CCA CGC 144 
lie Asn Gin Pro Ser Met Lys Glu Gin Leu Ala Ala Ala Leu Pro Arg 
315 320 325 

CAT ATG ACG GCT GAA CGT ATG ATC CGT ATC GCC ACC ACA GAA ATT CGT 192 
His Met Thr Ala Glu Arg Met He Arg He Ala Thr Thr Glu He Arg 
330 335 340 345 



AAA GTT CCG GCG TTA GGA AAC TGT GAC ACT ATG AGT TTT GTC AGT GCG 240 
Lys Val Pro Ala Leu Gly Asn Cys Asp Thr Met Ser Phe Val Ser Ala 
350 355 360 

ATC GTA CAG TGT TCA CAG CTC GGA CTT GAG CCA GGT AGC GCC CTC GGT 288 
He Val Gin Cys Ser Gin Leu Gly Leu Glu Pro Gly Ser Ala Leu Gly 
365 370 375 

CAT GCA TAT TTA CTG CCT TTT GGT AAT AAA AAC GAA AAG AGC GGT AAA 336 
His Ala Tyr Leu Leu Pro Phe Gly Asn Lys Asn Glu Lys Ser Gly Lys 
380 385 390 

AAG AAC GTT CAG CTA ATC ATT GGC TAT CGC GGC ATG ATT GAT CTG GCT 384 
Lys Asn Val Gin Leu He He Gly Tyr Arg Gly Met He Asp Leu Ala 
395 400 405 

CGC CGT TCT GGT CAA ATC GCC AGC CTG TCA GCC CGT GTT GTC CGT GAA 432 
Arg Arg Ser Gly Gin He Ala Ser Leu Ser Ala Arg Val Val Arg Glu 
410 415 420 425 

GGT GAC GAG TTT AGC TTC GAA TTT GGC CTT GAT GAA AAG TTA ATA CAC 480 
Gly Asp Glu Phe Ser Phe Glu Phe Gly Leu Asp Glu Lys Leu He His 
430 435 . 440 

CGC CCG GGA GAA AAC GAA GAT GCC CCG GTT ACC CAC GTC TAT GCT GTC 52 8 

Arg Pro Gly Glu Asn Glu Asp Ala Pro Val Thr His Val Tyr Ala Val 
445 450 455 

GCA AGA CTG AAA GAC GGA GGT ACT CAG TTT GAA GTT ATG ACG CGC AAA 576 
Ala Arg Leu Lys Asp Gly Gly Thr Gin Phe Glu Val Met Thr Arg Lys 
460 465 470 

CAG ATT GAG CTG GTG CGC AGC CTG AGT AAA GCT GGT AAT AAC GGG CCG 624 
Gin He Glu Leu Val Arg Ser Leu Ser Lys Ala Gly Asn Asn Gly Pro 
475 480 485 

TGG GTA ACT CAC TGG GAA GAA ATG GCA AAG AAA ACG GCT ATT CGT CGC 672 
Trp Val Thr His Trp Glu Glu Met Ala Lys Lys Thr Ala He Arg Arg 
490 495 500 505 

CTG TTC AAA TAT TTG CCC GTA TCA ATT GAG ATC CAG CGT GCA GTA TCA 72 0 

Leu Phe Lys Tyr Leu Pro Val Ser He Glu He Gin Arg Ala Val Ser 
510 515 520 

ATG GAT GAA AAG GAA CCA CTG ACA ATC GAT CCT GCA GAT TCC TCT GTA 76 8 

Met Asp Glu Lys Glu Pro Leu Thr He Asp Pro Ala Asp Ser Ser Val 
525 530 535 

TTA ACC GGG GAA TAC AGT GTA ATC GAT AAT TCA GAG GAA TAG 810 
Leu Thr Gly Glu Tyr Ser Val He Asp Asn Ser Glu Glu * 
540 545 550 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 70 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Thr Lys Gin Pro Pro He Ala Lys Ala Asp Leu Gin Lys Thr Gin 
15 10 15 

Gly Asn Arg Ala Pro Ala Ala Val Lys Asn Ser Asp Val He Ser Phe 
20 25 30 

He Asn Gin Pro Ser Met Lys Glu Gin Leu Ala Ala Ala Leu Pro Arg 
35 40 45 

His Met Thr Ala Glu Arg Met He Arg He Ala Thr Thr Glu He Arq 
50 55 60 

Lys Val Pro Ala Leu Gly Asn Cys Asp Thr Met Ser Phe Val Ser Ala 
65 70 75 80 

He Val Gin Cys Ser Gin Leu Gly Leu Glu Pro Gly Ser Ala Leu Gly 
85 90 95 

His Ala Tyr Leu Leu Pro Phe Gly Asn Lys Asn Glu Lys Ser Gly Lys 
100 105 110 

Lys Asn Val Gin Leu He He Gly Tyr Arg Gly Met He Asp Leu Ala 
115 120 125 

Arg Arg Ser Gly Gin He Ala Ser Leu Ser Ala Arg Val Val Arg Glu 
130 135 140 

Gly Asp Glu Phe Ser Phe Glu Phe Gly Leu Asp Glu Lys Leu He His 
145 150 155 160 

Arg Pro Gly Glu Asn Glu Asp Ala Pro Val Thr His Val Tyr Ala Val 
165 170 175 

Ala Arg Leu Lys Asp Gly Gly Thr Gin Phe Glu Val Met Thr Arg Lys 
180 185 190 

Gin He Glu Leu Val Arg Ser Leu Ser Lys Ala Gly Asn Asn Gly Pro 
195 200 205 

Trp Val Thr His Trp Glu Glu Met Ala Lys Lys Thr Ala He Arg Arg 
210 215 220 

Leu Phe Lys Tyr Leu Pro Val Ser He Glu He Gin Arg Ala Val Ser 
225 230 235 240 

Met Asp Glu Lys Glu Pro Leu Thr He Asp Pro Ala Asp Ser Ser Val 
245 250 255 

Leu Thr Gly Glu Tyr Ser Val He Asp Asn Ser Glu Glu * 
260 265 270 



(2) INFORMATION FOR SEQ ID NO: 6: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 876 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: araC 

{ ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (1..876) 
(D) OTHER INFORMATION :/product= "araC" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TGACAACTTG ACGGCTACAT CATTCACTTT TTCTTCACAA CCGGCACGGA ACTCGCTCGG 
GCTGGCCCCG GTGCATTTTT TAAATACCCG CGAGAAATAG AGTTGATCGT CAAAACCAAC 
ATTGCGACCG ACGGTGGCGA TAGGCATCCG GGTGGTGCTC AAAAGCAGCT TCGCCTGGCT 
GATACGTTGG TCCTCGCGCC AGCTTAAGAC GCTAATCCCT AACTGCTGGC GGAAAAGATG 
TGACAGACGC GACGGCGACA AGCAAACATG CTGTGCGACG CTGGCGATAT CAAAATTGCT 

GTCTGCCAGG TGATCGCTGA TGTACTGACA AGCCTCGCGT ACCCGATTAT CCATCGGTGG 360 

ATGGAGCGAC TCGTTAATCG CTTCCATGCG CCGCAGTAAC AATTGCTCAA GCAGATTTAT 420 

CGCCAGCAGC TCCGAATAGC GCCCTTCCCC TTGCCCGGCG TTAATGATTT GCCCAAACAG 4 80 

GTCGCTGAAA TGCGGCTGGT GCGCTTCATC CGGGCGAAAG AACCCCGTAT TGGCAAATAT 54 0 

TGACGGCCAG TTAAGCCATT CATGCCAGTA GGCGCGCGGA CGAAAGTAAA CCCACTGGTG GOO 

ATACCATTCG CGAGCCTCCG GATGACGACC GTAGTGATGA ATCTCTCCTG GCGGGAACAG 660 

CAAAATATCA CCCGGTCGGC AAACAAATTC TCGTCCCTGA TTTTTCACCA CCCCCTGACC 720 

GCGAATGGTG AGATTGAGAA TATAACCTTT CATTCCCAGC GGTCGGTCGA TAAAAAAATC 780 

GAGATAACCG TTGGCCTCAA TCGGCGTTAA ACCCGCCACC AGATGGGCAT TAAACGAGTA 840 

TCCCGGCAGC AGGGGATCAT TTTGCGCTTC AGCCAT on- 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 292 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Ala Glu Ala Gin Asn Asp Pro Leu Leu Pro Gly Tyr Ser Phe Asn 
^5 10 15 

Ala His Leu Val Ala Gly Leu Thr Pro lie Glu Ala Asn Gly Tyr Leu 
20 25 30 



60 
120 
180 
240 
300 
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ASP Phe Phe lie ;.sp P.o Leu Oly Met Lys Gly ryr lie Leu Asn 

40 45 

Leu Thr lie Arg Gly Gin Gly Val Val Lys Asn Gin Gly Arg Glu Phe 

60 

val Cys Arg Pro Gly Asp Ue Leu Leu Phe Pro Pro Gly Glu He His 

80 

His ayr Gly Arg His Pro Glu Ala Arg Glu Trp Tyr His Gin Trp Val 

Tyr Phe Arg Pro Arg Ala Oyr Trp His Glu Trp Leu Asn Trp Z Ser 

■^"^ 110 
lie Phe Ala Asn Thr Gly Phe Phe Arg Pro Asp Glu Ala His Gin Pro 

120 3^25 
His Phe ser Asp Leu Phe Gly Gin He lie Asn Ala Gly Gin Gly Glu 

Gly Arg Tyr Ser Glu Leu Leu - Ala He Asn Leu Leu Glu Gin Leu Leu 

Leu Arg Arg Met Glu Ala Xle Asn Glu Ser Leu His Pro Pro Met As^p 

^"^0 175 
Asn Arg Val Arg Glu Ala Cys Gin Tyr He Ser Asp His Leu Ala Asp 

185 

ser Asn Phe Asp lie Ala Ser Val Ala Gin His Val Cys Leu Ser Pro 

200 205 
ser Arg Leu Ser His Leu Phe Arg Gin Gin Leu Gly He Ser Val Leu 

"^^^ 220 
ser Trp Arg Glu Asp Gin Arg He Ser Gin Ala Lys Leu Leu Leu Ser 

240 

Thr Thr Arg Met Pro He Ala Thr Val Gly Arg Asn Val Gly Phe Asp 

250 255 
Asp Gin Leu Tyr Phe Ser Arg Val Phe Lys Lys Cys Thr Gly Ala Ser 

270 

Pro ser Glu Phe Arg Ala Gly Cys Glu Glu Lys Val Asn Asp Val Ala 

280 285 

Val Lys Leu Ser 
290 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CPiARACTERISTICS- 

(A) LENGTH: 861 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: bla 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .861 

(D) OTHER INFORMATION: /product = »'bla" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

S S T„ S ?S " 

300 

^ II S IZ ^ ?57. If. ^, - - - 

g X s srv?. 
5t= |i s s 's.^ t}/£^„^„- 

350 35I 

S =„7. 

365 

s ?n s ?f, sr. Si - 

380 

rtn CCA GTC ACA GAA AAG CAT CTT ACG 

Gin Asn Asp Leu Val Glu Tyr Ser Pro Val Thr GluTys ^^s Leu ^Jr 

4QQ 

Sp SJ? S ?al Ara G^ ^r?" ^"^^ ^"^^^ 

^^P ^ly Met Thr Val Arg Glu Leu Cys Ser Ala Ala He Thr Met Ser 



410 



420 



lip S r„ I- Si ^.^f.^?, 

S2 s s s ?s IS s^. ^„ 

S ?| If. ^„t^„S T.=.''i?e<^S'or„™i 

" 460 ^ 

S %Z ?f.r.l?.^?/^i™ - - ™ ™ 

480 



5oS 

i?s 2j sif.s?o^.7„^„^^£-f.^„'^5?„ 

510 

aS Phe if If f ^ ^ GGG TCT 

Ala Gly Trp Phe lie Ala Asp Lys Ser Gly Ala Gly Glu Arg Gly Ser 



96 



144 



192 



240 



288 



336 



384 



432 



480 



528 



576 



624 



672 



720 



530 
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Arg S ne xZ S ai"" 7° "".f "^^^ ATC 
Arg Giy lie lie Ala Ala Leu Gly Pro Asp Gly Lys Pro Ser Arg lie 

^•'^ 540 

vl^ T?^ I^*^ CAG GCA ACT ATG GAT GAA CGA AAT 

Val Val lie Tyr Thr Thr Gly Ser Gin Ala Thr Met Asp gTu ^^g^L 

555 560 

AGA CAG ATC GCT GAG ATA GGT GCC TCA CTG ATT AAG CAT TGG TAA 
Arg Gin He Ala Glu He Gly Ala Ser Leu He Lys ifs ^ ^ 

570 575 ^ 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 2 87 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Ser He Gin His Phe Arg Val Ala Leu He Pro Phe Phe Ala Ala 
^ 10 15 

Phe Cys Leu Pro Val Phe Ala His Pro Glu Thr Leu Val Lys Val Lys 
^° 25 30 

Asp Ala Glu Asp Gin Leu Gly Ala Arg Val Gly Tyr He Glu Leu Asp 

Leu Asn Ser Gly Lys He Leu Glu Ser Phe Arg Pro Glu Glu Arg Phe 

55 60 

Pro Met Met Ser Thr Phe Lys Val Leu Leu Cys Gly Ala Val Leu Ser 

^° 75 80 

Arg Val Asp Ala Gly Gin Glu Gin Leu Gly Arg Arg He His Tyr Ser 
85 90 95 

Gin Asn Asp Leu Val Glu Tyr Ser Pro Val Thr Glu Lys His Leu Thr 

105 no 

Asp Gly Met Thr Val Arg Glu Leu Cys Ser Ala Ala He Thr Met Ser 
-^15 120 125 

Asp Asn Thr Ala Ala Asn Leu Leu Leu Thr Thr lie Gly Gly Pro Lys 

135 140 

Glu Leu Thr Ala Phe Leu His Asn Met Gly Asp His Val Thr Arg Leu 

155 160 

Asp Arg Trp Glu Pro Glu Leu Asn Glu Ala lie Pro Asn Asp Glu Arg 
■^^5 170 3^75 

ASP Thr Thr Met Pro Val Ala Met Ala Thr Thr Leu Arg Lys Leu Leu 

iBO 



190 



Thr Gly Glu Leu Leu Thr Leu Ala Ser Arg Gin Gin Leu lie Asp Trp 

200 205 
Met Glu Ala Asp Lys Val Ala Gly Pro Leu Leu Arg Ser Ala Leu Pro 



220 



768 



816 



861 



Ala Gly Trp Phe He Ala Asp Lys Ser Gly Ala Gly Glu Arg Gly Ser 
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230 235 

Arg Gly He He Ala Ala Leu Gly Pro Asp Gly Lys Pro Ser Arg lie 

245 250 255 

Val Val lie Tyr Thr Thr Gly Ser Gin Ala Thr Met Asp Glu Arg Asn 

265 270 
Arg Gin lie Ala Glu lie Gly Ala Ser Leu He Lys His Trp * 

280 285 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 7195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pBAD-ETgamma 

(ix) FEATURE: 

(A) NAME/KEY: misc feature 

(B) LOCATION: 3588 . .4004 

(D) OTHER INFORMATION :/product= "red gamma" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
ATCGATGCAT AATGTGCCTG TCAAATGGAC GAAGCAGGGA TTCTGCAAAC CCTATGCTAC 
TCCGTCAAGC CGTCAATTGT CTGATTCGTT ACCAATTATG ACAACTTCAC GGCTACATCA 
TTCACTTTTT CTTCACAACC GGCACGGAAC TCGCTCGGGC TGGCCCCGGT GCATTrTTTA 
AATACCCGCG AGAAATAGAG TTCATCGTCA AAACCAACAT TGCGACCGAC GGTGGCGATA 
GGCATCCGGG TGGTGCTCAA AAGCAGCTTC GCCTCGCTGA TACGTTGGTC CTCGCGCCAG 
CTTAAGACGC TAATCCCTAA CT^CTGGCGG AAAAGATGTG ACAGACGCGA CGGCGACAAG 
CAAACATGCT GTGCGACGCT GGCGATATCA AAATTGCTGT CTGCCAGGTG ATCGCTGATG 
TACTGACAAG CCTCGCGTAC CCGATTATCC ATCGGTGGAT GGAGCGACTC GTTAATCGCT 
TCCATGCGCC GCAGTAACAA TTGCTCAAGC AGAITTATCG CCAGCAGCTC CGAATAGCGC 
CCTTCCCCTT GCCCGGCGTT AATGATTTGC CCAAACAGGT CGCTGAAATG CGGCTGGTGC 
GCTTCATCCG GGCGAAAGAA CCCCGTATTG GCAAATATIX5 ACGGCCAGTT AAGCCATTCA 
TGCCAGTAGG CGCGCGGACG AAAGTAAACC CACTGGTGAT ACCATTCGCG AGCCTCCGGA 
TGACGACCGT AGTGATGAAT CTCTCCTGGC GGGAACAGCA AAATATCACC CGGTCGGCAA 
ACAAATTCTC GTCCCTGATT TTTCACCACC CCCTGACCGC GAATGGTGAG ATTGAGAATA 
TAACCTTTCA TTCCCAGCGG TCGGTCGATA AAAAAATCGA GATAACCGTT GGCCTCAATC 
GGCGTTAAAC CCGCCACCAG ATGGGCATTA AACGAGTATC CCGGCAGCAG GGGATCATTT 
TGCGCTTCAG CCATACTTTT CATACTCCCG CCATTCAGAG AAGAAACCAA TTGTCCATAT 



60 
120 
180 
240 
300 
360 
420 
480 
54 0 
600 
660 
720 
780 
840 
900 
960 
1020 
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TGCATCAGAC 
ACCCCGCTTA 
AACAAAAGTG 
CTTTGCTATG 
TCGCAACTCT 
TGGATCCCGT 
ATTACCACGC 
CACTATATTT 
TAGGAACTGC 
TAGCACCTGA 
TGGAATGCGC 
TCATGTATCA 
ACGCTGAATC 
CGGACAAAAT 
AACGATTCAA 
ACGGTTATGA 
CTATTGAATG 
CAGGTCAACA 
AATGGCCAGC 
AGATCTCGAG 
GTATAATACG 
GATCTGCAAA 
AGTTTTATTA 
ACGGCTGAAC 
AACTGTGACA 
CCAGGTAGCG 
GGTAAAAAGA 
TCTGGTCAAA 
GAATTTGGCC 
ACCCACGTCT 
CGCAAACAGA 
ACTCACTGGG 
GTATCAATTG 
CCTGCAGATT 



ATTGCCGTCA 
TTAAAAGCAT 
TCTATAATCA 
CCATAGCATT 
CTACTGTTTC 
AATCGTAGAA 
GGGTCCCGGT 
GTGGCGTAAA 
TTTCCACTGC 
ATTTAACCGC 
AAGCACAGGA 
AAGCGTTATG 
ATCAATTTAC 
TATCCCTGAA 
AACCGCTTAT 
AGCACAGTTT 
CGGACGTTAT 
GGAATATCAC 
TATTAAGACA 
GTACCCGAGC 
ACAAGGTGAG 
AAACTCAGGG 
ACCAGCCATC 
GTATGATCCG 
CTATGAGTTT 
CCCTCGGTCA 
ACGTTCAGCT 
TCGCCAGCCT 
TTGATGAAAA 
ATGCTGTCGC 
TTGAGCTGGT 
AAGAAATGGC 
AGATCCAGCG 
CCTCTGTATT 



CTGCGTCTTT 
TCTGTAACAA 
CGGCAGAAAA 
TTTATCCATA 
TCCATACCCG 
GACATAGAGC 
ATCAGTAAGT 
AATGCCCCCG 
CGGGTACTTG 
CGTACAAACG 
AAAACGGTTA 
GCTTTGCCGC 
TGGGAAGATC 
TTTCACTGGA 
TACGACTACC 
GGAGTGCAGC 
CCGGTTGAAA 
CGCAATCTGC 
TTATCACTGC 
ACGTGTTGAC 
GAACTAAACC 
AAACCGTGCA 
AATGAAAGAG 
TATCGCCACC 
TGTCAGTGCG 
TGCATATTTA 
AATCATTGGC 
GTCAGCCCGT 
GTTAATACAC 
AAGACTGAAA 
GCGCAGCCTG 
AAAGAAAACG 
TGCAGTATCA 
AACCGGGGAA 



TACTGGCTCT 
AGCGGGACCA 
GTCCACATTG 
AGATTAGCGG 
TTTTTTTGGG 
CAGGTATTTA 
CTCAGCTCGA 
TGGACACCAC 
AACCGGAAGA 
CCGGAAAAGA 
TCACTGCGGA 
TGGGGCAATG 
CTGAAACAGG 
TCATGGACGT 
GCTATCACGT 
CAACTTTCGT 
TTTTCATGAT 
GAACCCTGTC 
CCCGCTGGGC 
AATTAATCAT 
ATGGCTAAGC 
CCAGCAGCAG 
CAACTGGCAG 
ACAGAAATTC 
ATCGTACAGT 
CTGCCTTTTG 
TATCGCGGCA 
GTTGTCCGTG 
CGCCCGGGAG 
GACGGAGGTA 
AGTAAAGCTG 
GCTATTCGTC 
ATGGATGAAA 
TACAGTGTAA 



TCTCGCTAAC 
AAGCCATGAC 
ATTATTTGCA 
ATCCTACCTG 
CTAGCAGGAG 
TTACGGAATT 
TGACATTGCT 
AAAGACAAAA 
ATTCAGTAAC 
AGAAGAGAAA 
AGAAGGCCGG 
GCTTGTTGAA 
AATTTTGTGT 
GAAAACTACG 
TCAGGATGCA 
TTTTCTGGTT 
GGGCGAAGAA 
TGACTGCCTG 
TAAGGAATAT 
CGGCATAGTA 
AACCACCAAT 
TTAAAAATAG 
CAGCTCTTCC 
GTAAAGTTCC 
GTTCACAGCT 
GTAATAAAAA 
TGATTGATCT 
AAGGTGACGA 
AAAACGAAGA 
CTCAGTTTGA 
GTAATAACGG 
GCCTGTTCAA 
AGGAACCACT 
TCGATAATTC 



CAAACCGGTA 
AAAAACGCGT 
CGGCGTCACA 
ACGCTTTTTA 
GAATTCACCA 
TCGAATGAGA 
GATACTCCGG 
ACGCTCGATT 
CGCTTTATCG 
GCGTTTCTGA 
AAAATTGAAC 
AGCGCCGGAC 
CGGTGCCGTC 
GCGGATATTC 
TTCTACAGTG 
GCCAGCACAA 
GCAAAACTGG 
AATACCGATG 
GCAAATGACT 
TATCGGCATA 
CGCAAAAGCC 
CGACGTGATT 
ACGCCATATG 
GGCGTTAGGA 
CGGACTTGAG 
CGAAAAGAGC 
GGCTCGCCGT 
GTTTAGCTTC 
TGCCCCGGTT 
AGTTATGACG 
GCCGTGGGTA 
ATATTTGCCC 
GACAATCGAT 
AGAGGAATAG 



1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 
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ATCTAAGCTT CCTGCTGAAC ATCAAAGGCA AGAAAACATC TGTTGTCAAA GACAGCATCC 3120 

TTGAACAAGG ACAATTAACA GTTAACAAAT AAAAACGCAA AAGAAAATGC CGATATCCTA 3180 

TTGGCATTTT CTTTTATTTC TTATCAACAT AAAGGTGAAT CCCATACCTC GAGCTTCACG 3240 

CTGCCGCAAG CACTCAGGGC GCAAGGGCTG CTAAAAGGAA GCGGAACACG TAGAAAGCCA 33 00 

GTCCGCAGAA ACGGTGCTGA CCCCGGATGA ATGTCAGCTA CTGGGCTATC TGGACAAGGG 3360 

AAAACGCAAG CGCAAAGAGA AAGCAGGTAG CTTGCAGTGG GCTTACATGG CGATAGCTAG 3420 

ACTGGGCGGT TTTATGGACA GCAAGCGAAC CGGAATTGCC AGCTGGGGCG CCCTCTGGTA 34 80 

AGGTTGGGAA GCCCTGCAAA GTAAACTGGA TGGCTTTCTT GCCGCCAAGG ATCTGATGGC 354 0 

GCAGGGGATC AAGATCTGAT CAAGAGACAG GATGAGGATC GTTTCGCATG GATATTAATA 3 600 

CTGAAACTGA GATCAAGCAA AAGCATTCAC TAACCCCCTT TCCTGTTTTC CTAATCAGCC 366 0 

CGGCATTTCG CGGGCGATAT TTTCACAGCT ATTTCAGGAG TTCAGCCATG AACGCTTATT 3720 

ACATTCAGGA TCGTCTTGAG GCTCAGAGCT GGGCGCGTCA CTACCAGCAG CTCGCCCGTG 3780 

AAGAGAAAGA GGCAGAACTG GCAGACGACA TGGAAAAAGG CCTGCCCCAG CACCTGTTTG 3 84 0 

AATCGCTATG CATCGATCAT TTGCAACGCC ACGGGGCCAG CAAAAAATCC ATTACCCGTG 3900 

CGTTTGATGA CGATGTTGAG TTTCAGGAGC GCATGGCAGA ACACATCCGG TACATGGTTG 3 960 

AAACCATTGC TCACCACCAG GTTGATATTG ATTCAGAGGT ATAAAACGAG TAGAAGCTTG 402 0 

GCTGTTTTGG CGGATGAGAG AAGATTTTCA GCCTGATACA GATTAAATCA GAACGCAGAA 4080 

GCGGTCTGAT AAAACAGAAT TTGCCTGGCG GCAGTAGCGC GGTGGTCCCA CCTGACCCCA 414 0 

TGCCGAACTC AGAAGTGAAA CGCCGTAGCG CCGATGGTAG TGTGGGGTCT CCCCATGCGA 4200 

GAGTAGGGAA CTGCCAGGCA TCAAATAAAA CGAAAGGCTC AGTCGAAAGA CTGGGCCTTT 4 260 

CGTTTTATCT GTTGTTTGTC GGTGAACGCT CTCCTGAGTA GGACAAATCC GCCGGGAGCG 4320 

GATTTGAACG TTGCGAAGCA ACGGCCCGGA GGGTGGCGGG CAGGACGCCC GCCATAAACT 43 80 

GCCAGGCATC AAATTAAGCA GAAGGCCATC CTGACGGATG GCCTTTTTGC GTTTCTACAA 4440 

ACTCTTTTGT TTATTTTTCT AAATACATTC AAATATGTAT CCGCTCATGA GACAATAACC 4 500 

CTGATAAATG CTTCAATAAT ATTGAAAAAG GAAGAGTATG AGTATTCAAC ATTTCCGTGT 4 560 

CGCCCTTATT CCCTTTTTTG CGGCATTTTG CCTTCCTGTT TTTGCTCACC CAGAAACGCT 4 620 

GGTGAAAGTA AAAGATGCTG AAGATCAGTT GGGTGCACGA GTGGGTTACA TCGAACTGGA 4680 

TCTCAACAGC GGTAAGATCC TTGAGAGTTT TCGCCCCGAA GAACGTTTTC CAATGATGAG 4 74 0 

CACTTTTAAA GTTCTGCTAT GTGGCGCGGT ATTATCCCGT GTTGACGCCG GGCAAGAGCA 4 800 

ACTCGGTCGC CGCATACACT ATTCTCAGAA TGACTTGGTT GAGTACTCAC CAGTCACAGA 486 0 

AAAGCATCTT ACGGATGGCA TGACAGTAAG AGAATTATGC AGTGCTGCCA TAACCATGAG 4 920 

TGATAACACT GCGGCCAACT TACTTCTGAC AACGATCGGA GGACCGAAGG AGCTAACCGC 4980 

TTTTTTGCAC AACATGGGGG ATCATGTAAC TCGCCTTGAT CGTTGGGAAC CGGAGCTGAA 504 0 

TGAAGCCATA CCAAACGACG AGCGTGACAC CACGATGCCT GTAGCAATGG CAACAACGTT 5100 
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GCGCAAACTA TTAACTGGCG AACTACTTAC TCTAGCTTCC CGGCAACAAT TAATAGACTG 5160 

GATGGAGGCG GATAAAGTTG CAGGACCACT TCTGCGCTCG GCCCTTCCGG CTGGCTGGTT 5220 

TATTGCTGAT AAATCTGGAG CCGGTGAGCG TGGGTCTCGC GGTATCATTG CAGCACTGGG 5280 

GCCAGATGGT AAGCCCTCCC GTATCGTAGT TATCTACACG ACGGGGAGTC AGGCAACTAT 5340 

GGATGAACGA AATAGACAGA TCGCTGAGAT AGGTGCCTCA CTGATTAAGC ATTGGTAACT 54 00 

GTCAGACCAA GTTTACTCAT ATATACTTTA GATTGATTTA CGCGCCCTGT AGCGGCGCAT 5460 

TAAGCGCGGC GGGTGTGGTG GTTACGCGCA GCGTGACCGC TACACTTGCC AGCGCCCTAG 5520 

CGCCCGCTCC TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 5580 

AAGCTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG TGCTTTACGG CACCTCGACC 5640 

CCAAAAAACT TGATTTGGGT GATGGTTCAC GTAGTGGGCC ATCGCCCTGA TAGACGGTTT 57 00 

TTCGCCCTTT GACGTTGGAG TCCACGTTCT TTAATAGTGG ACTCTTGTTC CAAACTTGAA 5760 

CAACACTCAA CCCTATCTCG GGCTATTCTT TTGATTTATA AGGGATTTTG CCGATTTCGG 5320 

CCTATTGGTT AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT 5880 

TAACGTTTAC AATTTAAAAG GATCTAGGTG AAGATCCTTT TTGATAATCT CATGACCAAA 5940 

ATCCCTTAAC GTGAGTTTTC GTTCCACTGA GCGTCAGACC CCGTAGAAAA GATCAAAGGA 6000 

TCTTCTTGAG ATCCTTTTTT TCTGCGCGTA ATCTGCTGCT TGCAAACAAA AAAACCACCG 6060 

CTACCAGCGG TGGTTTGTTT GCCGGATCAA GAGCTACCAA CTCTTTTTCC GAAGGTAACT 6120 

GGCTTCAGCA GAGCGCAGAT ACCAAATACT GTCCTTCTAG TGTAGCCGTA GTTAGGCCAC 6180 

CACTTCAAGA ACTCTGTAGC ACCGCCTACA TACCTCGCTC TGCTAATCCT GTTACCAGTG 624 0 

GCTGCTGCCA GTGGCGATAA GTCGTGTCTT ACCGGGTTGG ACTCAAGACG ATAGTTACCG 630 0 

GATAAGGCGC AGCGGTCGGG CTGAACGGGG GGTTCGTGCA CACAGCCCAG CTTGGAGCGA 6360 

ACGACCTACA CCGAACTGAG ATACCTACAG CGTGAGCTAT GAGAAAGCGC CACGCTTCCC 6420 

GAAGGGAGAA AGGCGGACAG GTATCCGGTA AGCGGCAGGG TCGGAACAGG AGAGCGCACG 6480 

AGGGAGCTTC CAGGGGGAAA CGCCTGGTAT CTTTATAGTC CTGTCGGGTT TCGCCACCTC 654 0 

TGACTTGAGC GTCGATTTTT GTGATGCTCG TCAGGGGGGC GGAGCCTATG GAAAAACGCC 6600 

AGCAACGCGG CCTTTTTACG GTTCCTGGCC TTTTGCTGGC CTTTTGCTCA CATGTTCTTT 6660 

CCTGCGTTAT CCCCTGATTC TGTGGATAAC CGTATTACCG CCTTTGAGTG AGCTGATACC 672 0 

GCTCGCCGCA GCCGAACGAC CGAGCGCAGC GAGTCAGTGA GCGAGGAAGC GGAAGAGCGC 6780 

CTGATGCGGT ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT AGGGTCATGG 6840 

CTGCGCCCCG ACACCCGCCA ACACCCGCTG ACGCGCCCTG ACGGGCTTGT CTGCTCCCGG 6900 

CATCCGCTTA CAGACAAGCT GTGACCGTCT CCGGGAGCTG CATGTGTCAG AGGTTTTCAC 6960 

CGTCATCACC GAAACGCGCG AGGCAGCAAG GAGATGGCGC CCAACAGTCC CCCGGCCACG 7020 

GGGCCTGCCA CCATACCCAC GCCGAAACAA GCGCTCATGA GCCCGAAGTG GCGAGCCCGA 7080 

TCTTCCCCAT CGGTGATGTC GGCGATATAG GCGCCAGCAA CCGCACCTGT GGCGCCGGTG 7140 
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ATGCCGGCCA CGATGCGTCC GGCGTAGAGG ATCTGCTCAT GTTTGACAGC TTATC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 7010 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(vii) IMMEDIATE SOURCE; 

(B) CLONE: pBAD- alpha -bet a -gamma 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1320. .2000 

(D) OTHER INFORMATION: /product = "red alpha" 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2 086. 2871 

(D) OTHER INFORMATION: /product = "red beta" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION : 3403 . .3819 

(D) OTHER INFORMATION: /product = "red gamma" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
ATCGATGCAT AATGTGCCTG TCAAATGGAC GAAGCAGGGA TTCTGCAAAC CCTATGCTAC 
TCCGTCAAGC CGTCAATTGT CTGATTCGTT ACCAATTATG ACAACTTGAC GGCTACATCA 
TTCACTTTTT CTTCACAACC GGCACGGAAC TCGCTCGGGC TC^CCCCGGT GCATTTTTTA 
AATACCCGCG AGAAATAGAG TTGATCGTCA AAACCAACAT TGCGACCGAC GGTGGCGATA 
GGCATCCGGG TGGTGCTCAA AAGCAGCTTC GCCTGGCTGA TACGTTGGTC CTCGCGCCAG 
CTTAAGACGC TAATCCCTAA CTGCTGGCGG AAAAGATGTG ACAGACGCGA CGGCGACAAG 
CAAACATGCT GTGCGACGCT GGCGATATCA AAATTGCTGT CTGCCAGGTG ATCGCTGATG 
TACTGACAAG CCTCGCGTAC CCGATTATCC ATCGGTGGAT GGAGCGACTC GTTAATCGCT 
TCCATGCGCC GCAGTAACAA TTGCTCAAGC AGATTTATCG CCAGCAGCTC CGAATAGCGC 
CCTTCCCCTT GCCCGGCGTT AATGATTTGC CCAAACAGGT CGCTGAAATG CGGCTGGTGC 
GCTTCATCCG GGCGAAAGAA CCCCGTATTG GCAAATATTG ACGGCCAGTT AAGCCATTCA 
TGCCAGTAGG CGCGCGGACG AAAGTAAACC CACTGGTGAT ACCATTCGCG AGCCTCCGGA 
TGACGACCGT AGTGATGAAT CTCTCCTGGC GGGAACAGCA AAATATCACC CGGTCGGCAA 
ACAAATTCTC GTCCCTGATT TTTCACCACC CCCTGACCGC GAATGGTGAG ATTGAGAATA 
TAACCTTTCA ITCCCAGCGG TCGGTCGATA AAAAAATCGA GATAACCGTT GGCCTCAATC 
GGCGTTAAAC CCGCCACCAG ATGGGCATTA AACGAGTATC CCGGCAGCAG GGGATCATTT 



7195 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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TGCGCTTCAG CCATACTTTT CATACTCCCG CCATTCAGAG AAGAAACCAA TTGTCCATAT 


1020 


TGCATCAGAC ATTGCCGTCA CTGCGTCTTT TACTGGCTCT TCTCGCTAAC CAAACCGGTA 


1080 


ACCCCGCTTA TTAAAAGCAT TCTGTAACAA AGCGGGACCA AAGCCATGAC AAAAACGCGT 


1140 


AACAAAAGTG TCTATAATCA CGGCAGAAAA GTCCACATTG ATTATTTGCA CGGCGTCACA 


1200 


CTTTGCTATG CCATAGCATT TTTATCCATA AGATTAGCGG ATCCTACCTG ACGCTTTTTA 


1260 


TCGCAACTCT CTACTGTTTC TCCATACCCG TTTTTTTGGG CTAGCAGGAG GAATTCACC 


1319 


ATG 
Met 


ACA 
Thr 


CCG 
Pro 
290 


GAC 
Asp 


ATT 
lie 


ATC CTG CAG CGT ACC GGG ATC GAT GTG AGA GCT 
lie Leu Gin Arg Thr Gly lie Asp Val Arg Ala 
295 300 


1367 


GTC 
Val 


GAA 
Glu 
305 


CAG 
Gin 


GGG 
Gly 


GAT 
Asp 


GAT GCG TGG CAC AAA TTA CGG CTC GGC GTC ATC 
Asp Ala Trp His Lys Leu Arg Leu Gly Val lie 
310 315 




ACC 
Thr 
320 


GCT 
Ala 


TCA 
Ser 


GAA 
Glu 


GTT 
Val 


CAC AAC GTG ATA GCA AAA CCC CGC TCC GGA AAG 
His Asn Val lie Ala Lys Pro Arg Ser Gly Lys 
325 330 335 


J. ft O J 


AAG 
Lys 


TGG 
Trp 


CCT 
Pro 


GAC 
Asp 


ATG 
Met 
340 


AAA ATG TCC TAC TTC CAC ACC CTG CTT GCT GAG 
Lys Met Ser Tyr Phe His Thr Leu Leu Ala Glu 
345 350 


1511 


GTT 
Val 


TGC 
Cys 


ACC 
Thr 


GGT 
Gly 
355 


GTG 
Val 


GCT CCG GAA GTT AAC GCT AAA GCA CTG GCC TGG 
Ala Pro Glu Val Asn Ala Lys Ala Leu Ala Trp 
360 365 


1559 


GGA 
Gly 


AAA 
Lys 


CAG 
Gin 
370 


TAC 
Tyr 


GAG 
Glu 


AAC GAC GCC AGA ACC CTG TTT GAA TTC ACT TCC 
Asn Asp Ala Arg Thr Leu Phe Glu Phe Thr Ser 
375 380 


1607 


GGC 
Gly 


GTG 
Val 
385 


AAT 
Asn 


GTT 
Val 


ACT 
Thr 


GAA TCC CCG ATC ATC TAT CGC GAC GAA AGT ATG 
Glu Ser Pro He He Tyr Arg Asp Glu Ser Met 
390 395 


1655 


CGT 
Arg 
400 


ACC 
Thr 


GCC 
Ala 


TGC 
Cys 


TCT 
Ser 


CCC GAT GGT TTA TGC AGT GAC GGC AAC GGC CTT 
Pro Asp Gly Leu Cys Ser Asp Gly Asn Gly Leu 
405 410 415 


1703 


GAA 
Glu 


CTG 
Leu 


AAA 
Lys 


TGC 
Cys 


CCG 
Pro 
420 


TTT ACC TCC CGG GAT TTC ATG AAG TTC CGG CTC 
Phe Thr Ser Arg Asp Phe Met Lys Phe Arg Leu 
425 430 


1751 


GGT 
Gly 


GGT 
Gly 


TTC 
Phe 


GAG 
Glu 
435 


GCC 
Ala 


ATA AAG TCA GCT TAC ATG GCC CAG GTG CAG TAC 
He Lys Ser Ala Tyr Met Ala Gin Val Gin Tyr 
440 445 


1799 


AGC 
Ser 


ATG 
Met 


TGG 
Trp 
450 


GTG 
Val 


ACG 
Thr 


CGA AAA AAT GCC TGG TAC TTT GCC AAC TAT GAC 
Arg Lys Asn Ala Trp Tyr Phe Ala Asn Tyr Asp 
455 460 


1847 


CCG 
Pro 


CGT 
Arg 
465 


ATG 
Met 


AAG 
Lys 


CGT 
Arg 


GAA GGC CTG CAT TAT GTC GTG ATT GAG CGG GAT 
Glu Gly Leu His Tyr Val Val He Glu Arg Asp 
470 475 


1895 


GAA 
Glu 
4B0 


AAG 
Lys 


TAC 
Tyr 


ATG 
Met 


GCG 
Ala 


AGT TTT GAC GAG ATC GTG CCG GAG TTC ATC GAA 
Ser Phe Asp Glu He Val Pro Glu Phe He Glu 
485 490 495 


1943 


AAA 

Lys 


ATG 
Met 


GAC 
Asp 


GAG 
Glu 


GCA 
Ala 


CTG GCT GAA ATT GGT TTT GTA TTT GGG GAG CAA 
Leu Ala Glu He Gly Phe Val Phe Gly Glu Gin 


1991 



500 505 510 
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S ^^'^ ^^^^^^^^CC CGAGCACGTG ITGACAATTA ATCATCGGCA 2040 



TAGTATATCG GCATAGTATA ATACGACAAG GTGAGGAACT AAACC ATG AGT ACT 

Met Ser Thr 

10 15 

?S Sf„ - - 

25 30 35 

AAA GGT GAT GCC AGC GAT GCG CAG TTC ATC GCA TTA CTG ATC GIT GCC 
Lys Gly Asp Ala Ser Asp Ala Gin Phe He Ala Y^u 72, ?fe ^II 
*° 45 50 

m ^ ^ TGG ACG AAA GAA ATT TAG GCC TTT CCT 

Asn Gin Tyr Gly Leu Asa Pro Trp Thr Lys Glu He Tyr Ala JJe pJo 

60 65 

iZ ^, S 

A?g iTe ill fji T T <^^G 

rg lie lie Asn Glu Asn Gin Gin Phe Asp Gly Met Asp Phe Glu Gin 

90 95 

"^^^ "^^^ CGG ATT TAG CGC AAG GAC CGT AAT CAT 

Asp Asn Glu Ser Cys Thr Cys Arg He Tyr Arg Lys "Jfp'^rg'JJn'^s 



110 115 



Pro tT» ^^"^ ?,^T ^ GAA TGC CGC CGC GAA CCA TTC 

Pro He Cys Val Thr Glu Trp Met Asp Glu Cys Arg Arg ^u ^o pf e 
120 125 



130 



^. S S |i ?li r.^e '^P^o^i^f/S/H- 

160 



?he 'SSr Phi ^l"" ^T'^ '"'^^ <^AT GAA GCC GAG CGC ATT GTC 

Phe Gly Phe Ala Gly He Tyr Asp Lys Asp Glu Ala Glu Arg lie vfl 

170 JL75 

180 To^ Glu Arg Asp lie Thr 

190 

P^n vIT CAG GAG ATT AAC ACT CTG CTG ATC GCC 

Pro val Asn Asp Glu Thr Met Gin Glu lie Asn Thr Leu Lei ""ne A^^^ 

2*^0 205 210 

CTG GAT AAA ACA TGG GAT GAC GAC TTA TTG CCG CTC TGT TCC CAG ATA 
Leu Asp Lys Thr Trp Asp Asp Asp Leu Leu Pro Yeu Cys Ser J^n ife 

220 225 

TTT CGC CGC GAC ATT CGT GCA TCG TCA GAA CTG ACA CAG GCC GAA GCA 
Phe Arg Arg Asp He Arg Ala Ser Ser Glu Leu Thr ^fn Ala^^u Ma 

235 240 



2094 



2142 



2190 



2238 



2266 



2334 



2382 



2430 



2478 



252b 



2574 



2622 



2670 



2718 



2766 



2814 
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™ ^ S S '^y ^. ^„ <f C OC. «>3 ™ ™ 

245 ^ ^ i;^^ Ala Ala Glu Gin Lys Val 

= " 255 

GCA GCA TAG ATCTCGAGAA GCTTCCTGCT GAACATCAAA GGCAAGAAAA 



c^Tcra^r caaagacagc atccttgaac aaggacaatt aacagttaac aaataaaaac 

GCAAAAGAAA ATGCCGATAT CCTATTGGCA rrPTC™ O^CTTATCA ACATAAAGGT 
GAATCCCATA CCTCGAGCTT CACGCTGCCG CAAGCACTCA GGGCGCAAGG GCTGCTAAAA 
GGAAGCGGAA CACGTAGAAA GCCAGTCCGC AGAAACGGTG CTGACCCCGG A^^XCA 
GCTACTGGGC TATCTGGACA AGGGAAAACG CAAGCGCAAA GAGAAAGCAG GTAGCTTGCA 
GTGGGCT^AC ATGGCGATAG CTAGACTGGC CGG™g GACAGCAAGC GAACCGGAAT 
O^^CCAGCZ^ GGCGCCCTCT GGTAAGGTTG GGAAGCCCTG CAAAGTAAAC IX^ATGGCTT 
TC^GCCGCC AAGGATCTGA TGGCGCAGGG GATCAAGATC TGATCAAGAG ACAGGATGAG 
" C A. G. A. ^.^^.OAG A. A.^^^ 

^ 10 



S S - ^. S'^'iT//^ - 

s s= - - ™ 

s fii s z 1^; s=„ If. 
s ?| ™„ 

90 

2g S; g?? S sfr "tT ^ 

95 Thr Arg Ala Phe Asp Asp Asp 

?S ill Sfs ^vS SI 1- £ ^gT. ?A - 



AACGAGTAGA AGCTTGGCTG TTTl^GCGGA TGAGAGAAGA TTI^CAGCCT GATACAGAn 
AAATCAGAAC GCAGAAGCGG TCTGATAAAA CAGAATTTGC CTGGCGGCAG TAGCGCGGTG 
GTCCCACCTG ACCCCATGCC GAACTCAGAA GTGAAACGCC GTAGCGCCGA TGGTAGTGTG 
GGGTCTCCCC ATGCGAGAGT AGGGAACTGC CAGGCATCAA ATAAAACGAA AGGCTCAGTC 



2862 

2911 

2971 
3031 
3091 
3151 
3211 
3271 
3331 
3391 
3441 

3489 

3537 

3585 

3633 

3681 

3729 

3777 

3819 

3879 
3939 
3999 
4059 
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GAAAGACTGG GCCTTTCGTT TTATCTCTTG TTTGTCGGTG AACGCTCTCC TGAGTAGGAC 
AAATCCGCCG GGAGCGGATT TGAACGTTGC GAAGCAACGG CCCGGAGGGT GGCGGGCAGG 
ACGCCCGCCA TAAACTGCCA GGCATCAAAT TAAGCAGAAG GCCATCCTGA CGGATGGCCT 
TTTTGCGTTT CTACAAACTC TTTTGTTTAT TTITCTAAAT ACATTCAAAT ATGTATCCGC 
TCATGAGACA ATAACCCTGA TAAATGCTTC AATAATATTG AAAAAGGAAG AGTATGAGTA 
TTCAACATTT CCGTGTCGCC CTTATTCCCT TTTTTGCGGC ATITTGCCTT CCTGTTTTTG 
CTCACCCAGA AACGCTGGTG AAAGTAAAAG ATGCTGAAGA TCAGTTGGGT GCACGAGTGG 
GTTACATCGA ACTGGATCTC AACAGCGGTA AGATCCTTGA GAGTTTTCGC CCCGAAGAAC 
GTTTTCCAAT GATGAGCACT TTTAAAGTTC TGCTATGTGG CGCGGTATTA TCCCGTGTTG 
ACGCCGGGCA AGAGCAACTC GGTCGCCGCA TACACTATTC TCAGAATGAC TTCGTTGAGT 
ACTCACCAGT CACAGAAAAG CATCTTACGG ATGGCATGAC AGTAAGAGAA TTATGCAGTG 
CTGCCATAAC CATGAGTGAT AACACTGCGG CCAACTTACT TCTGACAACG ATCGGAGGAC 
CGAAGGAGCT AACCGCTTTT TTGCACAACA TGGGGGATCA TGTAACTCGC CTTGATCGTT 
GGGAACCGGA GCTGAATGAA GCCATACCAA ACGACGAGCG TGACACCACG ATCCCTGTAG 
CAATGGCAAC AACGTTCCGC AAACTATTAA CTGGCGAACT ACTTACTCTA GCTTCCCGGC 
AACAATTAAT AGACTGGATG GAGGCGGATA AAGTTGCAGG ACCACTTCTG CGCTCGGCCC 
TTCCGGCTGG CTGGTTTATT GCTGATAAAT CTGGAGCCGG TGAGCGTGGG TCTCGCGGTA 
TCATTGCAGC ACTGGGGCCA GATGGTAAGC CCTCCCGTAT CGTAGTTATC TACACGACGG 
GGAGTCAGGC AACTATGGAT GAACGAAATA GACAGATCGC TGAGATAGGT GCCTCACTGA 
TTAAGCATTG GTAACTCTCA GACCAAGTTT ACTCATATAT ACTTTAGATT GATTTACGCG 
CCCTGTAGCG GCGCATTAAG CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA 
CTTGCCAGCG CCCTAGCGCC CGCTCCITTC GCTTTCITCC CTTCCTTTCT CGCCACGTTC 
GCCGGCTTTC CCCGTCAAGC TCTAAATCGG GGGCTCCCTT TAGGGTTCCG ATTTAGTGCT 
TTACGGCACC TCGACCCCAA AAAACTTGAT ITGGGTGATG GITCACGTAG TGGGCCATCG 
CCCTGATAGA CGGTTTTTCG CCCTTTGACG TTGGAGTCCA CGTTCTTTAA TAGTGGACTC 
TTGTTCCAAA CTTGAACAAC ACTCAACCCT ATCTCGGGCT ATTCTTTTGA TTTATAAGGG 
ATTTTGCCGA TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG 
AATTTTAACA AAATATTAAC GTTTACAATT TAAAAGGATC TAGGTGAAGA TCCTTTTTGA 
TAATCTCATG ACCAAAATCC CTTAACGTGA GTTTTCGTTC CACTGAGCGT CAGACCCCGT 
AGAAAAGATC AAAGGATCTT CTTGAGATCC TTTTTTTCTG CGCGTAATCT GCTGCITGCA 
AACAAAAAAA CCACCGCTAC CAGCGGTGGT TTGTTTGCCG GATCAAGAGC TACCAACTCT 
TTTTCCGAAG GTAACTGGCT TCAGCAGAGC GCAGATACCA AATACTGTCC TTCTAGTGTA 
GCCGTAGTTA GGCCACCACT TCAAGAACTC TGTAGCACCG CCTACATACC TCGCTCTGCT 
AATCCTGTTA CCAGTGGCTG CTGCCAGTGG CGATAAGTCG TGTCTTACCG GGITGGACTC 



4119 

4179 

4239 

4299 

4359 

4419 

4479 

4539 

4599 

4659 

4719 

4779 

4839 

4899 

4959 

5019 

5079 

5139 

5199 

5259 

5319 

5379 

5439 

5499 

5559 

5619 

5679 

5739 

5799 

5859 

5919 

5979 

6039 

6099 
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AAGACGATAG 


TTACCGGATA 


AGGCGCAGCG 


GTCGGGCTGA 


ACGGGGGGTT CGTGCACACA 


6159 


GCCCAGCTTG 


GAGCGAACGA 


CCTACACCGA 


ACTGAGATAC 


CTACAGCGTG AGCTATGAGA 


621S 


AAGCGCCACG 


CTTCCCGAAG 


GGAGAAAGGC 


GGACAGGTAT 


CCGGTAAGCG GCAGGGTCGG 


6279 


AACAGGAGAG 


CGCACGAGGG 


AGCTTCCAGG 


GGGAAACGCC 


TGGTATCTTT ATAGTCCTGT 


6339 


CGGGTTTCGC 


CACCTCTGAC 


TTGAGCGTCG 


ATTTTTGTGA 


TGCTCGTCAG GGGGGCGGAG 


6399 


CCTATGGAAA 


AACGCCAGCA 


ACGCGGCCTT 


TTTACGGTTC 


CTGGCCTTTT GCTGGCCTTT 


6459 


TGCTCACATG 


TTCTTTCCTG 


CGTTATCCCC 


TGATTCTGTG 


GATAACCGTA TTACCGCCTT 


6519 


TGAGTGAGCT 


GATACCGCTC 


GCCGCAGCCG 


AACGACCGAG 


CGCAGCGAGT CAGTGAGCGA 


6579 


GGAAGCGGAA 


GAGCGCCTGA 


TGCGGTATTT 


TCTCCTTACG 


CATCTGTGCG GTATTTCACA 


6639 


CCGCATAGGG 


TCATGGCTGC 


GCCCCGACAC 


CCGCCAACAC 


CCGCTGACGC GCCCTGACGG 


6699 


GCTTGTCTGC 


TCCCGGCATC 


CGCTTACAGA 


CAAGCTGTGA 


CCGTCTCCGG GAGCTGCATG 


6759 


TGTCAGAGGT 


TTTCACCGTC 


ATCACCGAAA 


CGCGCGAGGC 


AGCAAGGAGA TGGCGCCCAA 


6819 


CAGTCCCCCG 


GCCACGGGGC 


CTGCCACCAT 


ACCCACGCCG 


AAACAAGCGC TCATGAGCCC 


6879 


GAAGTGGCGA 


GCCCGATCTT 


CCCCATCGGT 


GATGTCGGCG 


ATATAGGCGC CAGCAACCGC 


6939 


ACCTGTGGCG 


CCGGTGATGC 


CGGCCACGAT 


GCGTCCGGCG 


TAGAGGATCT GCTCATGTTT 


6999 


GACAGCTTAT 


C 








7010 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 227 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Thr Pro Asp lie lie Leu Gin Arg Thr Gly lie Asp Val Arg Ala 
1 5 10 15 

Val Glu Gin Gly Asp Asp Ala Trp His Lys Leu Arg Leu Gly Val lie 
20 25 30 

Thr Ala Ser Glu Val His Asn Val He Ala Lys Pro Arg Ser Gly Lys 
35 40 45 

Lys Trp Pro Asp Met Lys Met Ser Tyr Phe His Thr Leu Leu Ala Glu 
50 55 60 

Val Cys Thr Gly Val Ala Pro Glu Val Asn Ala Lys Ala Leu Ala Trp 
65 70 75 80 

Gly Lys Gin Tyr Glu Asn Asp Ala Arg Thr Leu Phe Glu Phe Thr Ser 
85 90 95 

Gly Val Asn Val Thr Glu Ser Pro He He Tyr Arg Asp Glu Ser Met 
100 105 110 

Arg Thr Ala Cys Ser Pro Asp Gly Leu Cys Ser Asp Gly Asn Gly Leu 
115 120 125 
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Glu Leu Lys Cys Pro Phe Thr Ser Arg Asp Phe Met Lys Phe Arg Leu 

Gly Gly Phe Glu Ala lie Lys Ser Ala Tyr Met Ala Gin Val Gin Tyr 

160 

ser Met Trp Val Thr Arg Lys Asn Ala Trp Tyr Phe Ala Asn Tyr Asp 
■■•^S 170 ^ 

Pro Arg Met Lys Arg Glu Gly Leu His Tyr Val Val lie Glu Arg Asp 

185 

Glu Lys Tyr Met Ala Ser Phe Asp Glu He Val Pro Glu Phe He Glu 

200 205 
Lys Met Asp Glu Ala Leu Ala Glu He Gly Phe Val Phe Gly Glu Gin 



220 



Trp Arg * 

225 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 262 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Ser Thr Ala Leu Ala Thr Leu Ala Gly Lys Leu Ala Glu Arg Val 

^ 10 il 

Gly Met Asp Ser Val Asp Pro Gin Glu Leu He Thr Thr Leu Arg Gin 

Thr Ala Phe Lys Gly Asp Ala Ser Asp Ala Gin Phe He Ala Leu Leu 

40 45 

He val Ala Asn Gin Tyr Gly Leu Asn Pro Trp Thr Lys Glu He Tyr 

= = 60 
Ala Phe Pro Asp Lys Gin Asn Gly He Val Pro Val Val Gly Val Asp 

'5 80 
Gly Trp Ser Arg lie He Asn Glu Asn Gin Gin Phe Asp Gly Met Asp 

90 95 

Phe Glu Gin Asp Asn Glu Ser Cys Thr Cys Arg He Tyr Arg Lys Asp 



110 



Arg Asn His Pro He Cys Val Thr Glu Trp Met Asp Glu Cys Arg Arg 

120 



Glu Pro Phe Lys Thr Arg Glu Gly Arg Glu He Thr Gly Pro Trp Gin 
150 



140 

Ser His Pro Lys Arg Met Leu Arg His Lys Ala Met He Gin Cys Ala 



Arg Leu Ala Phe Gly Phe Ala Gly He Tyr Asp Lys Asp Glu Ala Glu 



170 175 
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Arg He Val Glu Asn Thr Ala Tyr Thr Ala Glu Arg Gin Pro Glu Arg 

185 

Asp He Thr Pro Val Asn Asp Glu Thr Met Gin Glu He Asn Thr Leu 
195 200 205 

Leu lie Ala Leu Asp Lys Thr Trp Asp Asp Asp Leu Leu Pro Leu Cys 
210 215 220 

Ser Gin He Phe Arg Arg Asp He Arg Ala Ser Ser Glu Leu Thr Gin 

230 235 240 

Ala Glu Ala Val Lys Ala Leu Gly Phe Leu Lys Gin Lys Ala Ala Glu 



245 250 

Gin Lys Val Ala Ala * 
260 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



255 



Met 
1 


Asp 


He 


Asn 


Thr 
5 


Glu 


Thr 


Glu He Lys 
10 


Gin 


Lys 


His 


Ser 


Leu 
15 


Thr 


Pro 


Phe 


Pro 


Val 
20 


Phe 


Leu 


He 


Ser Pro Ala 
25 


Phe 


Arg 


Gly 


Arg 
30 


Tyr 


Phe 


His 


Ser 


Tyr 
35 


Phe 


Arg 


Ser 


Ser 


Ala Met Asn 
40 


Ala 


Tyr 


Tyr 
45 


He 


Gin Asp 


Arg 


Leu 
50 


Glu 


Ala 


Gin 


Ser 


Trp Ala Arg His 
55 


Tyr 


Gin 
60 


Gin 


Leu 


Ala 


Arg 


Glu 
65 


Glu 


Lys 


Glu 


Ala 


Glu 
70 


Leu 


Ala Asp Asp 


Met 
75 


Glu 


Lys 


Gly 


Leu 


Pro 
80 


Gin 


His 


Leu 


Phe 


Glu 
85 


Ser 


Leu 


Cys He Asp 
90 


His 


Leu 


Gin 


Arg His 
95 


Gly 


Ala 


Ser 


Lys 


Lys 
100 


Ser 


He 


Thr 


Arg Ala Phe 
105 


Asp Asp 


Asp 


Val 
110 


Glu 


Phe 


Gin 


Glu 


Arg Met 
115 


Ala 


Glu 


His 


He Arg Tyr 
120 


Met 


Val 


Glu 
125 


Thr 


He 


Ala 


His 


His 
130 


Gin 


Val 


Asp 


He 


Asp 
135 


Ser Glu Val 


* 
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Table 1 : Sequences of Oligos for PGR 



Figure 3ab 

left: TGACCCCTCACAAGGAGACGACCTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 
right: TACAAATGTGGTATGGCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCCCGCTTTCCA 
template: pJP5603 

targeting vector: pSV-pazI 1 
Figure 3c 

a-lefl:CTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 

a-right: ATGATCCTCTAGAGTCGGTGCTCACTGCCCGCTTTCCA 

b-lefl: AGACGACCTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 

b-right: GCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCCCGCTTTCCA 

c-left: CACAAGGAGACGACCTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 

c-right:TGGTATGGCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCCCGCrrrCCA 

d-left:TGACCCCTCACAAGGAGACGACCTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 

d-right:TACAAATGTGGTATGGCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCCCGCTrTCCA 

e-Ieft: 

CACGCCCCTGACCCCTCACAAGGAGACGACCTTCCATGACCGAGTACAAGAGGGATGTAACGCACTGA 
e-right: 

TAAAACCTCTACAAATGTGGTATGGCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCCCGCnrCCA 
f-Ieft: 

TCCCCTGACCCACGCCCCTGACCCCTCACAAGGAGACGACCTTCCATGACCGAGTACAAGAGGGATGT 
AACGCACTGA 

f-right: 

TAAAGCAAGTAAAACCTCTACAAATGTGGTATGGCTGATTATGATCCTCTAGAGTCGGTGCTCACTGCC 
CGCTTTCCA 

template: pJP5603 

targeting vector: pSV-pazI 1 

Figure 3d 
a-left: 

TCATCCTCTGCATGGTCAGGTCATGGATGAGCAGACGATGGTGCAGGATCAAGGGCTGCTAAAGGAA 
a-right: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTC^ 
b-lefk: 

CACGAGCATCATCCTCTGCATGGTCAGGTCATGGATGAGCAGACGATGGCAAGGGCTGCTAAAGGAA 
b-right: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
c-Ieft: 

TTAACCGTCACGAGCATCATCCTCTGCATGGTCAGGTCATGGATGAGCACAAGGGCTGCTAAAGGAA 
c-nght: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
d-Ieft: 

TGCTGCTGAACGGCAAGCCGTTGCTGATTCGAGGCGTTAACCGTCACGACAAGGGCTGCTAAAGGAA 
d-right: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
e-left: 

TCTCTATCGTGCGGTGGTTGAACTGCACACCGCCGACGGCACGCTGATTCAAGGGCTGCTAAAGGAA 

e-nght: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
i-ieft: 

TGGAGTGACGGCAGTTATCTGGAAGATCAGGATATGTGGCGGATGAGCGCAAGGGCTGCTAAAGGAA 
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f-right: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
g-left: 

TGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCAAGGGCTGCTAAAGGAA 
g-right: 

TAATGCGAACAGCGCACGGCGTTAAAGTTGTTCTGCTTCATCAGCAGGATGGCGAAGAACTCCAGCAT 
h-Ieft: 

TGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCAAGGGCTGCTAAAGGAA 
h-right: 

TATTTTTGACACCAGACCAACTGGTAATGGTAGCGACCGGCGCTCAGCTGGCGAAGAACTCCAGCAT 
template: pJP5603 
targeting vector: pSV-pazl I 

Figure 4 
left: 

TCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGCTTATGCCCACCAGC 

TGGTATGGCTGATTATGATC 

right: 

TCCAACATGGATGCTGATTTATATGGGTATAAATGGGCTCGCGATAATGTCGGGCAATCAGGTGCGACA 
ATCTACCACCAGCTCTTTTCTACGGGGTCTGACGC 
template: pBR322 
targeting vector: Hoxa-Pl 

Figure 5 
left: 

TGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTTAATACGACTCACTATAGGGAGAACA 

GGAAACAGCTATGCCCATAACACCCAGAGTA 

right: 

TGCGCCGCTACAGGGCGCGTCCATTCGCCATTCAGGCCTGACTCACTAGTGATGGTGATGGTGATGTGG 
GGGGTGCCGCTCAGT 

template: pmtrx (a pBluescipt vector carrying mouse trithorax cDNA) 
targeting vector: pZero2.I 

Figure 6 
left: 

TGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGGAGAAAAAAATCACr 

GGATATACCACCG 

right: 

TACAGGGCGCGTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACGCCCCGCCCTGC 

CACTCATCGCA 

template: pMAK705 

targeting vector: pBAD-24 backbone Amp resistant gene 

Figure 8 

i: 

TGCCAAGCTTGACCCACTGTGGAAGTGTTCCAAAAAGCGGGAAGGCTCTTGAGCTACTTCACTAACAAC 

CGG 

g- 

TCACCATCTTCGGGCCATTTGTAGACTGGAATATTTCGAGCTATGAGTGTGCTACTTCACTAACAACCG 
G 

h: 

TGGCCCCAGGGTGACGCGGACATGGAGTTGTCGCCAGGGCACTGGTCCATGAGAGTGCCAAGCTACTC 
GCGAC 
template: pKaZ 
targeting vector: Hoxa-PI 
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Figure 9 

TAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCT 

TTGCCTGGTTTATAAGTTCGTATAGCATACATTATACGAAGTTATGGGCTGCTAAAGGAAGCGGAACAC 

G 

k: 

TGGCAGTTCAGGCCAATCCGCGCCGGATGCGGTGTATCGCTCGCCACTTCAACATCAACGGTAATCGCC 
ATTTGACCATATAACTTCGTATAATGTATGCTATACGAAGTTATCCCCAGAGTCCCGCTCAGAAGAACT 
template: pJP5603 

targeting vector: JC9604 chromosome 

Figure 10 
I: 

TAGCTTGGCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACCCATCAC 

ATATACCTGCCGTTCACTAT 

m: 

TATCGGTGGCCGTGGTGTCGGCTCCGCCGCCTTCATACTGCACCGGGCGGGAAGGCGATTCCGAAGCCC 
AACCTTTCATAGAAGCC 
template: plB279 
targeting vector: pSV-paX I 

1*:GCTTGGCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAA 
m*: 

TCGGTGGCCGTGGTGTCGGCTCCGCCGCCTTCATACTGCACCGGGCGGGAAGGATCCACAGATTTGATC 
CAGCGATACAGC 
template: pSV-pazl 1 
targeting vector: pSV-sacB-neo 

Figure 1 1 
n: 

TACCGCATTAAAGCTTATCGATGATAAGCTGTCAAACATGAGAATTGACCCGGAACCCTTCTCGAGGAA 
GTTCCTATTCTCTAGAAAGTATAGGAACTTCCGAATAAATACCTGTGACGGAAGATCACTT 

TTCCCTCAAGAATTTTACTCTGTCAGAAACGGCCTTAACGACGTAGTCGAGGGACCTAGAAGTTCCTAT 
ACTTTCTAGAGAATAGGAACTTCATTATCACTTATTCAGGCGTAGCACCAGGCG 
template: pMAK705 
targeting vector: Hoxa-Pl 

Figure 12 
left: 

TGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGGAGAAAAAAATCACT 

GGATATACCACCG 

right: 

TACAGGGCGCGTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACGCCCCGCCCTGC 

CACTCATCGCA 

template: pMAK705 

targeting vector: pBAD-24 backbone Amp resistant gene 
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